mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jonathan Tan	1f2be8bed6	index-pack: teach --promisor to forbid pack name Currently, - Running "index-pack --promisor" outside a repo segfaults. - It may be confusing to a user that running "index-pack --promisor" within a repo may make changes to the repo's object DB, especially since the packs indexed by the index-pack invocation may not even be related to the repo. As discussed in [1] and [2], teaching --promisor to forbid a packfile name solves both these problems. This combination of arguments requires a repo (since we are writing the resulting .pack and .idx to it) and it is clear that the files are related to the repo. Currently, Git uses "index-pack --promisor" only when fetching into a repo, so it could be argued that we should teach "index-pack" a new argument (say, "--fetching-mode") instead of tying --promisor to a generic argument like the packfile name. However, this --promisor feature could conceivably be used whenever we have a packfile that is known to come from the promisor remote (whether obtained through Git's fetch protocol or through other means) so not using a new argument seems reasonable - one could envision a user-made script obtaining a packfile and then running "index-pack --promisor --stdin", for example. In fact, it might be possible to relax the restriction further (say, by also allowing --promisor when indexing a packfile that is in the object DB), but relaxing the restriction is backwards-compatible so we can revisit that later. One thing to watch out for is the possibility of a future Git feature that indexes a pack in the context of a repo, but does not necessarily write the resulting pack to it (and does not necessarily desire to make any changes to the object DB). One such feature would be fetch quarantine, which might need the repo context in order to detect hash collisions, but would also need to ensure that the object DB is undisturbed in case the fetch fails for whatever reason, even if the reason occurs only after the indexing is complete. It may not be obvious to the implementer of such a feature that "index-pack" could sometimes write packs other than the indexed pack to the object DB, but there are already other ways that "fetch" could write to the object DB (in particular, packfile URIs and bundle URIs), so hopefully the implementation of this future feature would already include a test that the object DB be undisturbed. This change requires the change to t5300 by `1f52cdfacb` (index-pack: document and test the --promisor option, 2022-03-09) to be undone. (--promisor is already tested indirectly, so we don't need the explicit test here any more.) [1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/ [2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-20 10:37:56 +09:00
Patrick Steinhardt	656ca9204a	builtin/gc: provide hint when maintenance hits a stale schedule lock When running scheduled maintenance via `git maintenance start`, we acquire a lockfile to ensure that no other scheduled maintenance task is running in the repository concurrently. If so, we do provide an error to the user hinting that another process seems to be running in this repo. There are two important cases why such a lockfile may exist: - An actual git-maintenance(1) process is still running in this repository. - An earlier process may have crashed or was interrupted part way through and has left a stale lockfile behind. In `c95547a394` (builtin/gc: fix crash when running `git maintenance start`, 2024-10-10), we have fixed an issue where git-maintenance(1) would crash with the "start" subcommand, and the underlying bug causes the second scenario to trigger quite often now. Most users don't know how to get out of that situation again though. Ideally, we'd be removing the stale lock for our users automatically. But in the context of repository maintenance this is rather risky, as it can easily run for hours or even days. So finding a clear point where we know that the old process has exited is basically impossible. We have the same issue in other subsystems, e.g. when locking refs. Our lockfile interfaces thus provide the `unable_to_lock_message()` function for exactly this purpose: it provides a nice hint to the user that explains what is going on and how to get out of that situation again by manually removing the file. Adapt git-maintenance(1) to print a similar hint. While we could use the above function, we can provide a bit more context as we know exactly what kind of process would create the lockfile. Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com> Reported-by: Kev Kloss <kkloss@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-20 10:26:12 +09:00
Jean-Noël Avila	f3b2ceea39	doc: git-diff: apply format changes to config part By the way, we also change the sentences where git-diff would refer to itself, so that no link is created in the HTML output. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:31:05 +09:00
Jean-Noël Avila	0b080a70ab	doc: git-diff: apply format changes to diff-generate-patch Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:31:05 +09:00
Jean-Noël Avila	6ace09b2f9	doc: git-diff: apply format changes to diff-format Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:31:04 +09:00
Jean-Noël Avila	6b552e39c0	doc: git-diff: apply format changes to diff-options The format change is only applied to the sections of the file that are filtered in git-diff. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:31:04 +09:00
Jean-Noël Avila	e72c2d2e91	doc: git-diff: apply new documentation guidelines The documentation for git-diff has been updated to follow the new documentation guidelines. The following changes have been applied to the series of patches: - switching the synopsis to a synopsis block which will automatically format placeholders in italics and keywords in monospace - use _<placeholder>_ instead of <placeholder> in the description - use `backticks for keywords and more complex option descriptions`. The new rendering engine will apply synopsis rules to these spans. - prevent git-diff from self-referencing itself via gitlink macro when the generated link would point to the same page. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:31:04 +09:00
Junio C Hamano	b8558e6abd	Merge branch 'ps/reftable-detach' into ps/reftable-iterator-reuse * ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user	2024-11-19 12:24:33 +09:00
Patrick Steinhardt	988e7f5e95	reftable/system: provide thin wrapper for lockfile subsystem We use the lockfile subsystem to write lockfiles for "tables.list". As with the tempfile subsystem, the lockfile subsystem also hooks into our infrastructure to prune stale locks via atexit(3p) or signal handlers. Furthermore, the lockfile subsystem also handles locking timeouts, which do add quite a bit of logic. Having to reimplement that in the context of Git wouldn't make a whole lot of sense, and it is quite likely that downstream users of the reftable library may have a better idea for how exactly to implement timeouts. So again, provide a thin wrapper for the lockfile subsystem instead such that the compatibility shim is fully self-contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:11 +09:00
Patrick Steinhardt	6361226b79	reftable/stack: drop only use of `get_locked_file_path()` We've got a single callsite where we call `get_locked_file_path()`. As we're about to convert our usage of the lockfile subsystem to instead be used via a compatibility shim we'd have to implement more logic for this single callsite. While that would be okay if Git was the only supposed user of the reftable library, it's a bit more awkward when considering that we have to reimplement this functionality for every user of the library eventually. Refactor the code such that we don't call `get_locked_file_path()` anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	01e49941d6	reftable/system: provide thin wrapper for tempfile subsystem We use the tempfile subsystem to write temporary tables, but given that we're in the process of converting the reftable library to become standalone we cannot use this subsystem directly anymore. While we could in theory convert the code to use mkstemp(3p) instead, we'd lose access to our infrastructure that automatically prunes tempfiles via atexit(3p) or signal handlers. Provide a thin wrapper for the tempfile subsystem instead. Like this, the compatibility shim is fully self-contained in "reftable/system.c". Downstream users of the reftable library would have to implement their own tempfile shims by replacing "system.c" with a custom version. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	86b770b0bb	reftable/stack: stop using `fsync_component()` directly We're executing `fsync_component()` directly in the reftable library so that we can fsync data to disk depending on "core.fsync". But as we're in the process of converting the reftable library to become standalone we cannot use that function in the library anymore. Refactor the code such that users of the library can inject a custom fsync function via the write options. This allows us to get rid of the dependency on "write-or-die.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	c2f08236ed	reftable/system: stop depending on "hash.h" We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	88e297275b	reftable: explicitly handle hash format IDs The hash format IDs are used for two different things across the reftable codebase: - They are used as a 32 bit unsigned integer when reading and writing the header in order to identify the hash function. - They are used internally to identify which hash function is in use. When one only considers the second usecase one might think that one can easily change the representation of those hash IDs. But because those IDs end up in the reftable header and footer on disk it is important that those never change. Create separate constants `REFTABLE_FORMAT_ID_` and use them in contexts where we read or write reftable headers. This serves multiple purposes: - It allows us to more easily discern cases where we actually use those constants for the on-disk format. - It detangles us from the same constants that are defined in libgit.a, which is another required step to convert the reftable library to become standalone. - It makes the next step easier where we stop using `GIT__FORMAT_ID` constants in favor of a custom enum. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:09 +09:00
Patrick Steinhardt	17e8039878	reftable/system: move "dir.h" to its only user We still include "dir.h" in "reftable/system.h" even though it is not used by anything but by a single unit test. Move it over into that unit test so that we don't accidentally use any functionality provided by it in the reftable codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:08 +09:00
Elijah Newren	5e904f1a4a	fast-import: avoid making replace refs point to themselves If someone replaces a commit with a modified version, then builds on that commit, and then later decides to rewrite history in a format like git fast-export --all \| CMD_TO_TWEAK_THE_STREAM \| git fast-import and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the replacement did, then at the end you'd get a replace ref that points to itself. For example: $ git show-ref \| grep replace fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264 Git commands which pay attention to replace refs will die with an error when a self-referencing replace ref is present: $ git log fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 Avoid such problems by deleting replace refs that will simply end up pointing to themselves at the end of our writing. Unless users specify --quiet, warn them when we delete such a replace ref. Two notes about this patch: * We are not ignoring the problematic update of the replace ref (turning it into a no-op), we are replacing the update with a delete. The logic here is that if the repository had a value for the replace ref before fast-import was run, and the replace ref was explicitly named in the fast-import stream, we don't want the replace ref to be left with a pre-fast-import value. * While loops with more than one element (e.g. refs/replace/A points to B, and refs/replace/B points to A) are possible, they seem much less plausible. It is pretty easy to create a sequence of git-filter-repo commands that will trigger a self-referencing replace ref, but I do not know how to trigger a scenario with a cycle length greater than 1. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 09:39:33 +09:00
Jeff King	2af8ead52b	object-file: inline empty tree and blob literals We define macros with the bytes of the empty trees and blobs for sha1 and sha256. But since `e1ccd7e2b1` (sha1_file: only expose empty object constants through git_hash_algo, 2018-05-02), those are used only for initializing the git_hash_algo entries. Any other code using the macros directly would be suspicious, since a hash_algo pointer is the level of indirection we use to make everything work with both sha1 and sha256. So let's future proof against code doing the wrong thing by dropping the macros entirely and just initializing the structs directly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:48 +09:00
Jeff King	e37feea00b	object-file: treat cached_object values as const The cached-object API maps oids to in-memory entries. Once inserted, these entries should be immutable. Let's return them from the find_cached_object() call with a const tag to make this clear. Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:48 +09:00
Jeff King	9202ffcf10	object-file: drop oid field from find_cached_object() return value The pretend_object_file() function adds to an array mapping oids to object contents, which are later retrieved with find_cached_object(). We naturally need to store the oid for each entry, since it's the lookup key. But find_cached_object() also returns a hard-coded empty_tree object. There we don't care about its oid field and instead compare against the_hash_algo->empty_tree. The oid field is left as all-zeroes. This all works, but it means that the cached_object struct we return from find_cached_object() may or may not have a valid oid field, depend whether it is the hard-coded tree or came from pretend_object_file(). Nobody looks at the field, so there's no bug. But let's future-proof it by returning only the object contents themselves, not the oid. We'll continue to call this "struct cached_object", and the array entry mapping the key to those contents will be a "cached_object_entry". This would also let us swap out the array for a better data structure (like a hashmap) if we chose, but there's not much point. The only code that adds an entry is git-blame, which adds at most a single entry per process. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:48 +09:00
Jeff King	b2a95dfd63	object-file: move empty_tree struct into find_cached_object() The fake empty_tree struct is a static global, but the only code that looks at it is find_cached_object(). The struct itself is a little odd, with an invalid "oid" field that is handled specially by that function. Since it's really just an implementation detail, let's move it to a static within the function. That future-proofs against other code trying to use it and seeing the weird oid value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:47 +09:00
Jeff King	2911f9ed1e	object-file: drop confusing oid initializer of empty_tree struct We treat the empty tree specially, providing an in-memory "cached" copy, which allows you to diff against it even if the object doesn't exist in the repository. This is implemented as part of the larger cached_object subsystem, but we use a stand-alone empty_tree struct. We initialize the oid of that struct using EMPTY_TREE_SHA1_BIN_LITERAL. At first glance, that seems like a bug; how could this ever work for sha256 repositories? The answer is that we never look at the oid field! The oid field is used to look up entries added by pretend_object_file() to the cached_objects array. But for our stand-alone entry, we look for it independently using the_hash_algo->empty_tree, which will point to the correct algo struct for the repository. This happened in `62ba93eaa9` (sha1_file: convert cached object code to struct object_id, 2018-05-02), which even mentions that this field is never used. Let's reduce confusion for anybody reading this code by replacing the sha1 initializer with a comment. The resulting field will be all-zeroes, so any violation of our assumption that the oid field is not used will break equally for sha1 and sha256. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:47 +09:00
Jeff King	e770f36307	object-file: prefer array-of-bytes initializer for hash literals We hard-code a few well-known hash values for empty trees and blobs in both sha1 and sha256 formats. We do so with string literals like this: #define EMPTY_TREE_SHA256_BIN_LITERAL \ "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \ "\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \ "\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \ "\x53\x21" and then use it to initialize the hash field of an object_id struct. That hash field is exactly 32 bytes long (the size we need for sha256). But the string literal above is actually 33 bytes long due to the NUL terminator. This is legal in C, and the NUL is ignored. Side note on legality: in general excess initializer elements are forbidden, and gcc will warn on both of these: char foo[3] = { 'h', 'u', 'g', 'e' }; char bar[3] = "VeryLongString"; I couldn't find specific language in the standard allowing initialization from a string literal where _just_ the NUL is ignored, but C99 section 6.7.8 (Initialization), paragraph 32 shows this exact case as "example 8". However, the upcoming gcc 15 will start warning for this case (when compiled with -Wextra via DEVELOPER=1): CC object-file.o object-file.c:52:9: warning: initializer-string for array of ‘unsigned char’ is too long [-Wunterminated-string-initialization] 52 \| "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ object-file.c:79:17: note: in expansion of macro ‘EMPTY_TREE_SHA256_BIN_LITERAL’ which is understandable. Even though this is not a bug for us, since we do not care about the NUL terminator (and are just using the literal as a convenient format), it would be easy to accidentally create an array that was mistakenly unterminated. We can avoid this warning by switching the initializer to an actual array of unsigned values. That arguably demonstrates our intent more clearly anyway. Reported-by: Sam James <sam@gentoo.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 21:48:47 +09:00
Patrick Steinhardt	5dac35bbde	Makefile: let clar header targets depend on their scripts The targets that generate clar headers depend on their source files, but not on the script that is actually generating the output. Fix the issue by adding the missing dependencies. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:59:26 +09:00
Patrick Steinhardt	8caa7b9b05	cmake: use verbatim arguments when invoking clar commands Pass the VERBATIM option to `add_custom_command()`. Like this, all arguments to the commands will be escaped properly for the build tool so that the invoked command receives each argument unchanged. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:59:26 +09:00
Patrick Steinhardt	8839dccc8d	cmake: use SH_EXE to execute clar scripts In `30bf9f0aaa` (cmake: set up proper dependencies for generated clar headers, 2024-10-21), we have deduplicated the logic to generate our clar headers by reusing the same scripts that our Makefile does. Despite the deduplication, this refactoring also made us rebuild the headers in case the source files change, which didn't happen previously. The commit also introduced an issue though: we execute the scripts directly, so when the host does not have "/bin/sh" available they will fail. This is for example the case on Windows when importing the CMake project into Microsoft Visual Studio. Address the issue by invoking the scripts with `SH_EXE`, which contains the discovered path of the shell interpreter. While at it, wrap the overly long lines in the CMake build instructions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:59:25 +09:00
Patrick Steinhardt	9a91ab9400	t/unit-tests: convert "clar-generate.awk" into a shell script Convert "clar-generate.awk" into a shell script that invokes awk(1). This allows us to avoid the shell redirect in the build system, which may otherwise be a problem with build systems on platforms that use a different shell. While at it, wrap the overly long lines in the CMake build instructions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:59:25 +09:00
Kristoffer Haugsbakk	820fd1a569	Documentation/git-bundle.txt: discuss naïve backups It might be naïve to think that those who need this education would end up here in the first place. But I think it’s good to mention this high-level concept here on a command which provides a backup strategy. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:56:26 +09:00
Kristoffer Haugsbakk	c43a67f83d	Documentation/git-bundle.txt: mention --all in spec. refs Mention `--all` as an alternative in “Specifying References”. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:56:25 +09:00
Kristoffer Haugsbakk	f27b48d904	Documentation/git-bundle.txt: remove old `--all` example We don’t need this part now that we have a fleshed-out `--all` example. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:56:25 +09:00
Kristoffer Haugsbakk	df0cf6faad	Documentation/git-bundle.txt: mention full backup example Provide an example about how to make a “full backup” with caveats about what that means in this case. This is a requested use-case.[1] But the doc is a bit unassuming about it: If you want to match `git clone --mirror`, which would include your refs such as `refs/remotes/*`, use `--all`. The user cannot be expected to formulate “I want a full backup” as “I want to match `git clone --mirror`” for a bundle file or something. Let’s drop this mention of `--all` later in the doc and frontload it. † 1: E.g.: • https://stackoverflow.com/questions/5578270/fully-backup-a-git-repo • https://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:56:25 +09:00
brian m. carlson	639cd8db63	reflog: rename unreachable In C23, "unreachable" is a macro that invokes undefined behavior if it is invoked. To make sure that our code compiles on a variety of C versions, rename unreachable to "is_unreachable". Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:42:08 +09:00
brian m. carlson	e8b3bcf491	index-pack: rename struct thread_local "thread_local" is a keyword in C23. To make sure that our code compiles on a wide variety of C versions, rename struct thread_local to "struct thread_local_data" to avoid a conflict. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:42:08 +09:00
Kristoffer Haugsbakk	68e3c69efa	Documentation/glossary: describe "trailer" Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:41:24 +09:00
Junio C Hamano	090d24e9af	Clean up RelNotes for 2.48 There somehow ended up too many bogus "merge X later to maint" comments for topics that cannot be merged ever down to 'maint' because they were forked from more recent integration branches in the draft release notes. Remove them, as they are inviting for mistakes later. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-16 02:27:40 +09:00
brian m. carlson	0ffb5a6bf1	Allow cloning from repositories owned by another user Historically, Git has allowed users to clone from an untrusted repository, and we have documented that this is safe to do so: `upload-pack` tries to avoid any dangerous configuration options or hooks from the repository it's serving, making it safe to clone an untrusted directory and run commands on the resulting clone. However, this was broken by `f4aa8c8bb1` ("fetch/clone: detect dubious ownership of local repositories", 2024-04-10) in an attempt to make things more secure. That change resulted in a variety of problems when cloning locally and over SSH, but it did not change the stated security boundary. Because the security boundary has not changed, it is safe to adjust part of the code that patch introduced. To do that and restore the previous functionality, adjust enter_repo to take two flags instead of one. The two bits are - ENTER_REPO_STRICT: callers that require exact paths (as opposed to allowing known suffixes like ".git", ".git/.git" to be omitted) can set this bit. Corresponds to the "strict" parameter that the flags word replaces. - ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without ownership check can set this bit. The former is --strict-paths option of "git daemon". The latter is set only by upload-pack, which honors the claimed security boundary. Note that local clones across ownership boundaries require --no-local so that upload-pack is used. Document this fact in the manual page and provide an example. This patch was based on one written by Junio C Hamano. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 11:05:06 +09:00
Taylor Blau	e199290592	pack-objects: only perform verbatim reuse on the preferred pack When reusing objects from source pack(s), write_reused_pack_verbatim() is responsible for reusing objects whole eword_t's at a time. It works by taking the longest continuous run of objects from the beginning of each source pack that the caller wants, and reuses the entirety of that section from each pack. This is based on the assumption that we don't have any gaps within the region. This assumption relieves us from having to patch any OFS_DELTAs, since we know that there aren't any gaps between any delta and its base in that region. To illustrate why this assumption is necessary, suppose we have some pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was selected from a pack other than P, then the bit corresponding to object Y will appear earlier in the bitmap than the bits corresponding to X and Z. If pack-objects already has or will use the copy of Y from the pack it was selected from in the MIDX, then it is an error to reuse all objects between X and Z in the source pack. Doing so will cause us to reuse Y from a different pack than the one which represents Y in the MIDX, causing us to either: - include the object twice, assuming that the caller wants Y in the pack, or - include the object once, resulting in us packing more objects than necessary. This regression comes from `ca0fd69e37` (pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which incorrectly assumed that there would be no gaps in reusable regions of non-preferred packs. Instead, we can only safely perform the whole-word reuse optimization on the preferred pack, where we know with certainty that no gaps exist in that region of the bitmap. We can still reuse objects from non-preferred packs, but we have to inspect them individually in write_reused_pack() to ensure that any gaps that may exist are accounted for. This allows us to simplify the implementation of write_reused_pack_verbatim() back to almost its pre-multi-pack reuse form, since we can now assume that the beginning of the pack appears at the beginning of the bitmap, meaning that we don't have to account for any bits up to the first word boundary (like we had to special case in `ca0fd69e37`). The only significant changes from the pre-ca0fd69e37 implementation are: - that we can no longer inspect words up to the end of reuse_packfile_bitmap->word_alloc, since we only want to look at words whose bits all correspond to objects in the given packfile, and - that we return early when given a reuse_packfile which is not preferred, making the call a noop. In the future, it might be possible to restore this optimization if we could guarantee that some reuse packs don't contain any gaps by construction (similar to the "disjoint packs" idea in very early versions of multi-pack reuse). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 09:13:31 +09:00
Taylor Blau	57f35cfd7c	t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure In the multi-pack reuse code, there are two paths for reusing the on-disk representation of an object, handled by: - builtin/pack-objects.c::write_reused_pack_one() - builtin/pack-objects.c::write_reused_pack_verbatim() The former is responsible for copying the bytes for a single object out of an existing source pack. The latter does the same but for a region of objects aligned at eword_t boundaries. Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked into writing out objects from some source pack, even when those objects were selected from a different source pack in the MIDX bitmap. When the caller wants at least one of the objects in that region, pack-objects will write the same object twice as a result of this bug. In the other case where the caller doesn't want any of the objects in the region of interest, we will write out objects that weren't requested. Demonstrate this bug by creating two packs, where the preferred one of those packs contains a single object which also appears in the main (non-preferred) pack. A separate bug[^1] prevents us from triggering the main bug when the duplicated object is the last one in the main pack, but any earlier object will suffice. We could fix that separate bug, but the following commit will simplify write_reused_pack_verbatim() and only call it on the preferred pack, so doing so would have little point. [^1]: Because write_reused_pack_verbatim() only reuses bits in the range off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0); off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p, pos - reuse_packfile->bitmap_pos); written += pos - reuse_packfile->bitmap_pos; /* We're recording one chunk, not one object. */ record_reused_object(pack_start_off, pack_start_off - (hashfile_total(out) - pack_start)); , or in other words excluding the object beginning at position 'pos - reuse_packfile->bitmap_pos' in the source pack. But since reuse_packfile->bitmap_pos is '1' in the non-preferred pack (accounting for the single-object pack which is preferred), we don't actually copy the bytes from the last object. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 09:13:31 +09:00
Karthik Nayak	b886db48c6	refs: don't invoke reference-transaction hook for reflogs The reference-transaction hook is invoked whenever there is a reference update being performed. For each state of the transaction, we iterate over the updates present and pass this information to the hook. The `ref_update` structure is used to hold these updates within a `transaction`. We use the same structure for holding reflog updates too. Which means that the reference transaction hook is also obtaining information about a reflog update. This is a bug, since: - The hook is designed to work with reference updates and reflogs updates are different. - The hook doesn't have the required information to distinguish reference updates from reflog updates. This is particularly evident when the default branch (pointed by HEAD) is updated, we see that the hook also receives information about HEAD being changed. In reality, we only add a reflog update for HEAD, while HEAD's values remains the same. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 08:48:23 +09:00
Jeff King	72ad6dc368	test-lib: move malloc-debug setup after $PATH setup Originally, the conditional definition of the setup/teardown functions for malloc checking could be run at any time, because they depended only on command-line options and the system getconf function. But since `02d900361c` (test-lib: check malloc debug LD_PRELOAD before using, 2024-11-11), we probe the system by running "git version". Since this code runs before we've set $PATH to point to the version of Git we intend to test, we actually run the system version of git. This mostly works, since what we really care about is whether the LD_PRELOAD works, and it should work the same with any program. But there are some corner cases: 1. You might not have a system git at all, in which case the preload will appear to fail, even though it could work with the actual built version of git. 2. Your system git could be linked in a different way. For example, if it was built statically, then it will ignore LD_PRELOAD entirely, and we might assume that the preload works, even though it might not when used with a dynamic build. We could give a more complete path to the version of Git we intend to test, but features like GIT_TEST_INSTALLED make that not entirely trivial. So instead, let's just bump the setup until after we've set up the $PATH. There's no need for us to do it early, as long as it is done before the first test runs. Reported-by: Toon Claes <toon@iotcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-14 12:19:26 +09:00
Junio C Hamano	25b0f41288	The ninth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-13 08:35:34 +09:00
Junio C Hamano	183ea3eabf	Merge branch 'ps/mingw-rename' The MinGW compatibility layer has been taught to support POSIX semantics for atomic renames when other process(es) have a file opened at the destination path. * ps/mingw-rename: compat/mingw: support POSIX semantics for atomic renames compat/mingw: allow deletion of most opened files compat/mingw: share file handles created via `CreateFileW()`	2024-11-13 08:35:34 +09:00
Junio C Hamano	486c9d3995	Merge branch 'jt/commit-graph-missing' A regression where commit objects missing from a commit-graph can cause an infinite loop when doing a fetch in a partial clone has been fixed. * jt/commit-graph-missing: fetch-pack: die if in commit graph but not obj db Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"	2024-11-13 08:35:33 +09:00
Junio C Hamano	51ba601160	Merge branch 'en/shallow-exclude-takes-a-ref-fix' The "--shallow-exclude=<ref>" option to various history transfer commands takes a ref, not an arbitrary revision. * en/shallow-exclude-takes-a-ref-fix: doc: correct misleading descriptions for --shallow-exclude upload-pack: fix ambiguous error message	2024-11-13 08:35:32 +09:00
Junio C Hamano	110c8fe8f5	Merge branch 'ak/t1016-style' Test modernization. * ak/t1016-style: t1016: clean up style	2024-11-13 08:35:32 +09:00
Junio C Hamano	6890c99e38	Merge branch 'ps/leakfixes-part-9' More leakfixes. * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...	2024-11-13 08:35:31 +09:00
Simon Marchi	98e4015593	builtin/difftool: intialize some hashmap variables When running a dir-diff command that produces no diff, variables `wt_modified` and `tmp_modified` are used while uninitialized, causing: $ /home/smarchi/src/git/git-difftool --dir-diff master free(): invalid pointer [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master ... Invalid free() / delete / delete[] / realloc() at 0x48478EF: free (vg_replace_malloc.c:989) by 0x422CAC: hashmap_clear_ (hashmap.c:208) by 0x283830: run_dir_diff (difftool.c:667) by 0x284103: cmd_difftool (difftool.c:801) by 0x238E0F: run_builtin (git.c:484) by 0x2392B9: handle_builtin (git.c:750) by 0x2399BC: cmd_main (git.c:921) by 0x356FEF: main (common-main.c:64) Address 0x1ffefff180 is on thread 1's stack in frame #2, created by run_dir_diff (difftool.c:358) ... If taking any `goto finish` path before these variables are initialized, `hashmap_clear_and_free()` operates on uninitialized data, sometimes causing a crash. This regression was introduced in `7f795a1715` (builtin/difftool: plug several trivial memory leaks, 2024-09-26). Fix it by initializing those variables with the `HASHMAP_INIT` macro. Add a test comparing the main branch to itself, resulting in no diff. Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-13 08:11:19 +09:00
Jeff King	fe17a25905	refspec: store raw refspecs inside refspec_item The refspec struct keeps two matched arrays: one for the refspec_item structs and one for the original raw refspec strings. The main reason for this is that there are other users of refspec_item that do not care about the raw strings. But it does make managing the refspec struct awkward, as we must keep the two arrays in sync. This has led to bugs in the past (both leaks and double-frees). Let's just store a copy of the raw refspec string directly in each refspec_item struct. This simplifies the handling at a small cost: 1. Direct callers of refspec_item_init() will now get an extra copy of the refspec string, even if they don't need it. This should be negligible, as the struct is already allocating two strings for the parsed src/dst values (and we tend to only do it sparingly anyway for things like the TAG_REFSPEC literal). 2. Users of refspec_appendf() will now generate a temporary string, copy it, and then free the result (versus handing off ownership of the temporary string). We could get around this by having a "nodup" variant of refspec_item_init(), but it doesn't seem worth the extra complexity for something that is not remotely a hot code path. Code which accesses refspec->raw now needs to look at refspec->item.raw. Other callers which just use refspec_item directly can remain the same. We'll free the allocated string in refspec_item_clear(), which they should be calling anyway to free src/dst. One subtle note: refspec_item_init() can return an error, in which case we'll still have set its "raw" field. But that is also true of the "src" and "dst" fields, so any caller which does not _clear() the failed item is already potentially leaking. In practice most code just calls die() on an error anyway, but you can see the exception in valid_fetch_refspec(), which does correctly call _clear() even on error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:48 +09:00
Jeff King	d36af33081	refspec: drop separate raw_nr count A refspec struct contains zero or more refspec_item structs, along with matching "raw" strings. The items and raw strings are kept in separate arrays, but those arrays will always have the same length (because we write them only via refspec_append_nodup(), which grows both). This can lead to bugs when manipulating the array, since the arrays and lengths must be modified in lockstep. For example, the bug fixed in the previous commit, which forgot to decrement raw_nr. So let's get rid of "raw_nr" and have only "nr", making this kind of bug impossible (and also making it clear that the two are always matched, something that existing code already assumed but was not guaranteed by the interface). Even though we'd expect "alloc" and "raw_alloc" to likewise move in lockstep, we still need to keep separate counts there if we want to continue to use ALLOC_GROW() for both. Conceptually this would all be simpler if refspec_item just held onto its own raw string, and we had a single array. But there are callers which use refspec_item outside of "struct refspec" (and so don't hold on to a matching "raw" string at all), which we'd possibly need to adjust. So let's not worry about refactoring that for now, and just get rid of the redundant count variable. That is the first step on the road to combining them anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:48 +09:00
Jeff King	b970509c59	fetch: adjust refspec->raw_nr when filtering prefetch refspecs In filter_prefetch_refspecs(), we may remove one or more refspecs if they point into refs/tags/. When we do, we remove the item from the refspec->items array, shifting subsequent items down, and then decrement the refspec->nr count. We also remove the item from the refspec->raw array, but fail to decrement refspec->raw_nr. This leaves us with a count that is too high, and anybody looking at the "raw" array will erroneously see either: 1. The removed entry, if there were no subsequent items to shift down. 2. A duplicate of the final entry, as everything is shifted down but there was nothing to overwrite the final item. The obvious culprit to run into this is calling refspec_clear(), which will try to free the removed entry (case 1) or double-free the final entry (case 2). But even though the bug has existed since the function was added in `2e03115d0c` (fetch: add --prefetch option, 2021-04-16), we did not trigger it in the test suite. The --prefetch option is normally only used with configured refspecs, and we never bother to call refspec_clear() on those (they are stored as part of a struct remote, which is held in a global variable). But you could trigger case 2 manually like: git fetch --prefetch . refs/tags/foo refs/tags/bar Ironically you couldn't trigger case 1, because the code accidentally leaked the string in the raw array, and the two bugs (the leak and the double-free) cancelled out. But when we fixed the leak in `ea4780307c` (fetch: free "raw" string when shrinking refspec, 2024-09-24), it became possible to trigger that, too, with a single item: git fetch --prefetch . refs/tags/foo We can fix both cases by just correctly decrementing "raw_nr" when we shrink the array. Even though we don't expect people to use --prefetch with command-line refspecs, we'll add a test to make sure it behaves well (like the test just before it, we're just confirming that the filtered prefetch succeeds at all). Reported-by: Eric Mills <ermills@epic.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:47 +09:00
Jonathan Tan	c08589efdc	index-pack: repack local links into promisor packs Teach index-pack to, when processing the objects in a pack with --promisor specified on the CLI, repack local objects (and the local objects that they refer to, recursively) referenced by these objects into promisor packs. This prevents the situation in which, when fetching from a promisor remote, we end up with promisor objects (newly fetched) referring to non-promisor objects (locally created prior to the fetch). This situation may arise if the client had previously pushed objects to the remote, for example. One issue that arises in this situation is that, if the non-promisor objects become inaccessible except through promisor objects (for example, if the branch pointing to them has moved to point to the promisor object that refers to them), then GC will garbage collect them. There are other ways to solve this, but the simplest seems to be to enforce the invariant that we don't have promisor objects referring to non-promisor objects. This repacking is done from index-pack to minimize the performance impact. During a fetch, the only time most objects are fully inflated in memory is when their object ID is computed, so we also scan the objects (to see which objects they refer to) during this time. Also to minimize the performance impact, an object is calculated to be local if it's a loose object or present in a non-promisor pack. (If it's also in a promisor pack or referred to by an object in a promisor pack, it is technically already a promisor object. But a misidentification of a promisor object as a non-promisor object is relatively benign here - we will thus repack that promisor object into a promisor pack, duplicating it in the object store, but there is no correctness issue, just an issue of inefficiency.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 10:18:16 +09:00

... 15 16 17 18 19 ...

76201 Commits