mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Elijah Newren	c55c3f20b1	merge-ort: loosen commented requirements The comment above type_short_descriptions claimed that the order had to match what was found in the conflict_info_and_types enum. Since type_short_descriptions uses designated initializers, the order should not actually matter; I am guessing that positional initializers may have been under consideration when that comment was added, but the comment was not updated when designated initializers were chosen. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:35:25 -07:00
Elijah Newren	5fadf1f933	merge-ort: clearer propagation of failure-to-function from merge_submodule The 'clean' member variable is somewhat of a tri-state (1 = clean, 0 = conflicted, -1 = failure-to-determine), but we often like to think of it as binary (ignoring the possibility of a negative value) and use constructs like '!clean' to reflect this. However, these constructs can make codepaths more difficult to understand, unless we handle the negative case early and return pre-emptively; do that in handle_content_merge() to make the code a bit easier to read. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:35:24 -07:00
Elijah Newren	9ed8e17d8a	merge-ort: fix type of local 'clean' var in handle_content_merge () handle_content_merge() returns an int. Every caller of handle_content_merge() expects an int. However, we declare a local variable 'clean' that we use for the return value to be unsigned. To make matters worse, we also assign 'clean' the return value of merge_submodule() in one codepath, which is defined to return an int. It seems that the only reason to have 'clean' be unsigned was to allow a cutesy bit manipulation operation to be well-defined. Fix the type of the 'clean' local in handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:35:24 -07:00
Elijah Newren	0b4f726cde	merge-ort: maintain expected invariant for priv member The calling convention for the merge machinery is One call to init_merge_options() One or more calls to merge_incore_[non]recursive() One call to merge_finalize() (possibly indirectly via merge_switch_to_result()) Both merge_switch_to_result() and merge_finalize() expect opt->priv == NULL && result->priv != NULL which is supposed to be set up by our move_opt_priv_to_result_priv() function. However, two codepaths dealing with error cases did not execute this necessary logic, which could result in assertion failures (or, if assertions were compiled out, could result in segfaults). Fix the oversight and add a test that would have caught one of these problems. While at it, also tighten an existing test for a non-recursive merge to verify that it fails with appropriate status. Most merge tests in the testsuite check either for success or conflicts; those testing for neither are rare and it is good to ensure they support the invariant assumed by builtin/merge.c in this comment: /* * The backend exits with 1 when conflicts are * left to be resolved, with 2 when it does not * handle the given merge at all. */ So, explicitly check for the exit status of 2 in these cases. Reported-by: Matt Cree <matt.cree@gearset.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:35:24 -07:00
Elijah Newren	e79bdb426c	merge-ort: extract handling of priv member into reusable function In preparation for a subsequent commit which will ensure we do not forget to maintain our invariants for the priv member in error codepaths, extract the necessary functionality out into a separate function. This change is cosmetic at this point, and introduces no changes beyond an extra assertion sanity check. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:35:24 -07:00
Xing Xin	63d903ff52	unbundle: extend object verification for fetches The existing fetch.fsckObjects and transfer.fsckObjects configurations were not fully applied to bundle-involved fetches, including direct bundle fetches and bundle-uri enabled fetches. Furthermore, there was no object verification support for unbundle. This commit extends object verification support in `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When this option is enabled, we append the `--fsck-objects` flag to `git-index-pack`. The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches, where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable this option for `bundle.c:unbundle`, specifically in: - `transport.c:fetch_refs_from_bundle` for direct bundle fetches. - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches. This addition ensures a consistent logic for object verification during fetches. Tests have been added to confirm functionality in the scenarios mentioned above. Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:30:08 -07:00
Xing Xin	d0cbc75680	fetch-pack: expose fsckObjects configuration logic Currently, we can use "transfer.fsckObjects" and the more specific "fetch.fsckObjects" to control checks for broken objects in received packs during fetches. However, these configurations were only acknowledged by `fetch-pack.c:get_pack` and did not take effect in direct bundle fetches or fetches with _bundle-uri_ enabled. This commit exposes the fetch-then-transfer configuration logic by adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This new function is used to replace the assignment for `fsck_objects` in `fetch-pack.c:get_pack`. In the next commit, this function will also be used to extend fsck support for bundle-involved fetches. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:30:07 -07:00
Xing Xin	3079026fc1	bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:30:07 -07:00
Eric Wong	75daa42ddf	t1006: ensure cat-file info isn't buffered by default While working on buffering changes to `git cat-file' in a separate patch, I inadvertently made the output of --batch-check and the `info' command of --batch-command buffered as if opt->buffer_output is turned on by default. Buffering by default breaks some 3rd-party Perl scripts using cat-file, but this breakage was not detected anywhere in our test suite. Add a small Perl snippet to test this problem since (AFAIK) other equivalent ways to test this behavior from Bourne shell and/or awk would require racy sleeps, non-portable FIFOs or tedious C code. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-20 10:28:46 -07:00
Peter Krefting	47d2691ae9	git-gui: sv.po: Update Swedish translation (576t0f0u) Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Johannes Sixt <j6t@kdbg.org>	2024-06-20 19:27:35 +02:00
Kyle Zhao	2e5a636593	merge: avoid write merge state when unable to write index Writing the merge state after the index write fails is meaningless and could potentially cause Git to lose changes. Signed-off-by: Kyle Zhao <kylezhao@tencent.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-18 08:13:35 -07:00
Junio C Hamano	66ac6e4bcd	The fourteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-17 15:55:59 -07:00
Junio C Hamano	4216329457	Merge branch 'ps/no-writable-strings' Building with "-Werror -Wwrite-strings" is now supported. * ps/no-writable-strings: (27 commits) config.mak.dev: enable `-Wwrite-strings` warning builtin/merge: always store allocated strings in `pull_twohead` builtin/rebase: always store allocated string in `options.strategy` builtin/rebase: do not assign default backend to non-constant field imap-send: fix leaking memory in `imap_server_conf` imap-send: drop global `imap_server_conf` variable mailmap: always store allocated strings in mailmap blob revision: always store allocated strings in output encoding remote-curl: avoid assigning string constant to non-const variable send-pack: always allocate receive status parse-options: cast long name for OPTION_ALIAS http: do not assign string constant to non-const field compat/win32: fix const-correctness with string constants pretty: add casts for decoration option pointers object-file: make `buf` parameter of `index_mem()` a constant object-file: mark cached object buffers as const ident: add casts for fallback name and GECOS entry: refactor how we remove items for delayed checkouts line-log: always allocate the output prefix line-log: stop assigning string constant to file parent buffer ...	2024-06-17 15:55:58 -07:00
Junio C Hamano	72576d139d	Merge branch 'jk/imap-send-plug-all-msgs-leak' A leak in "git imap-send" that somehow escapes LSan has been plugged. * jk/imap-send-plug-all-msgs-leak: imap-send: free all_msgs strbuf in "out" label	2024-06-17 15:55:58 -07:00
Junio C Hamano	42b8b5bfd0	Merge branch 'jk/am-retry' "git am" has a safety feature to prevent it from starting a new session when there already is a session going. It reliably triggers when a mbox is given on the command line, but it has to rely on the tty-ness of the standard input. Add an explicit way to opt out of this safety with a command line option. * jk/am-retry: test-terminal: drop stdin handling am: add explicit "--retry" option	2024-06-17 15:55:56 -07:00
Junio C Hamano	cff3b034d5	Merge branch 'jc/varargs-attributes' Varargs functions that are unannotated as printf-like or execl-like have been annotated as such. * jc/varargs-attributes: __attribute__: add a few missing format attributes __attribute__: mark some functions with LAST_ARG_MUST_BE_NULL __attribute__: remove redundant attribute declaration for git_die_config() __attribute__: trace2_region_enter_printf() is like "printf"	2024-06-17 15:55:55 -07:00
Junio C Hamano	40a163f217	Merge branch 'ps/ref-storage-migration' A new command has been added to migrate a repository that uses the files backend for its ref storage to use the reftable backend, with limitations. * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version	2024-06-17 15:55:55 -07:00
Junio C Hamano	dfd668fa84	Merge branch 'ps/check-docs-fix' "make check-docs" noticed problems and reported to its output but failed to signal its findings with its exit status, which has been corrected. * ps/check-docs-fix: ci/test-documentation: work around SyntaxWarning in Python 3.12 gitlab-ci: add job to run `make check-docs` Documentation/lint-manpages: bubble up errors Makefile: extract script to lint missing/extraneous manpages	2024-06-17 15:55:54 -07:00
Junio C Hamano	4551858c18	Merge branch 'ps/ci-fix-detection-of-ubuntu-20' Fix for an embarrassing typo that prevented Python2 tests from running anywhere. * ps/ci-fix-detection-of-ubuntu-20: ci: fix check for Ubuntu 20.04	2024-06-17 15:55:53 -07:00
Junio C Hamano	7e2d0348d8	Merge branch 'ap/credential-clear-fix' Upon expiration event, the credential subsystem forgot to clear in-core authentication material other than password (whose support was added recently), which has been corrected. * ap/credential-clear-fix: credential: clear expired c->credential, unify secret clearing	2024-06-17 15:55:53 -07:00
Junio C Hamano	4d8ae4d3ca	Merge branch 'jc/format-patch-with-range-diff' The inter/range-diff output has been moved to the end of the patch when format-patch adds it to a single patch, instead of writing it before the patch text, to be consistent with what is done for a cover letter for a multi-patch series. * jc/format-patch-with-range-diff: format-patch: move range/inter diff at the end of a single patch output show_log: factor out interdiff/range-diff generation	2024-06-17 15:55:52 -07:00
Eric Wong	8270201971	Git.pm: use array in command_bidi_pipe example command_bidi_pipe takes the git command and optional arguments as an array, not a string. Make sure the documentation example is usable code. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-17 13:41:51 -07:00
Kyle Lippincott	34d982caaf	attr: fix msan issue in read_attr_from_index Memory sanitizer (msan) is detecting a use of an uninitialized variable (`size`) in `read_attr_from_index`: ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11 #1 0x5651f3415530 in read_attr git/attr.c #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6 #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2 #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2 #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2 #6 0x5651f34728da in convert_attrs git/convert.c:1320:2 #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2 #8 0x5651f357a35e in index_fd git/object-file.c:2630:34 #9 0x5651f357aa15 in index_path git/object-file.c:2657:7 #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7 #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9 #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7 #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18 #14 0x5651f321d327 in run_builtin git/git.c:474:11 #15 0x5651f321bc9e in handle_builtin git/git.c:729:3 #16 0x5651f321a792 in run_argv git/git.c:793:4 #17 0x5651f321a792 in cmd_main git/git.c:928:19 #18 0x5651f33dde1f in main git/common-main.c:62:11 The issue exists because `size` is an output parameter from `read_blob_data_from_index`, but it's only modified if `read_blob_data_from_index` returns non-NULL. The read of `size` when calling `read_attr_from_buf` unconditionally may read from an uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL before reading from `size`, but by then it's already too late: the uninitialized read will have happened already. Furthermore, there's no guarantee that the compiler won't reorder things so that it checks `size` before checking `!buf`. Make the call to `read_attr_from_buf` conditional on `buf` being non-NULL, ensuring that `size` is not read if it's never set. Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-17 13:32:42 -07:00
Taylor Blau	a83e21de6b	pack-bitmap.c: ensure pseudo-merge offset reads are bounded After reading the pseudo-merge extension's metadata table, we allocate an array to store information about each pseudo-merge, including its byte offset within the .bitmap file itself. This is done like so: pseudo_merge_ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t)); for (i = 0; i < index->pseudo_merges.nr; i++) { index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs); pseudo_merge_ofs += sizeof(uint64_t); } But if the pseudo-merge table is corrupt, we'll keep calling get_be64() past the end of the pseudo-merge extension, potentially reading off the end of the mmap'd region. Prevent this by ensuring that we have at least `table_size - 24` many bytes available to read (adding 24 to the left-hand side of our inequality to account for the length of the metadata component). This is sufficient to prevent us from reading off the end of the pseudo-merge extension, and ensures that all of the get_be64() calls below are in bounds. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 14:19:27 -07:00
Taylor Blau	20c49432e4	Documentation/technical/bitmap-format.txt: add missing position table While investigating a benign Coverity warning on the new pseudo-merge implementation, I was struggling to understand the (paraphrased) below: ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t)); for (i = 0; i < index->pseudo_merges.nr; i++) { index->pseudo_merges.v[i].at = get_be64(ofs); ofs += sizeof(uint64_t); } , in pack-bitmap.c::load_bitmap_header(). Looking at the documentation, the diagram describing the on-disk format (prior to this patch) suggested that the optional extended lookup table immediately preceded the trailing metadata portion. If that were the case, that would make the above code from load_bitmap_header() incorrect, as we'd be blindly reading into the extended offset table. But later on in the documentation there is a description of the pseudo-merge position table as immediately preceding the trailing metadata portion of the extension. And indeed, we do write the position table in pack-bitmap-write.c: /* write positions for all pseudo merges */ for (i = 0; i < writer->pseudo_merges_nr; i++) hashwrite_be64(f, pseudo_merge_ofs[i]); hashwrite_be32(f, writer->pseudo_merges_nr); hashwrite_be32(f, kh_size(writer->pseudo_merge_commits)); hashwrite_be64(f, table_start - start); hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t)); So this is purely a case of the diagram being out of sync with the textual description and actual implementation of the format specification. Add the missing component back to the format diagram to avoid further confusion in this area. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 14:19:26 -07:00
Patrick Steinhardt	dc89b7d522	hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` Guard declarations of functions that implicitly use `the_repository` with `USE_THE_REPOSITORY_VARIABLE` such that callers don't accidentally rely on that global variable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:35 -07:00
Patrick Steinhardt	912d4756cd	t/helper: remove dependency on `the_repository` in "proc-receive" The "proc-receive" test helper implicitly relies on `the_repository` via `parse_oid_hex()`. This isn't necessary though, and in fact the whole command does not depend on `the_repository` at all. Stop setting up `the_repository` and use `parse_oid_hex_any()` to parse object IDs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:35 -07:00
Patrick Steinhardt	8e9a1d0dc2	t/helper: fix segfault in "oid-array" command without repository The "oid-array" test helper can supposedly work without a Git repository, but will in fact crash because `the_repository->hash_algo` is not initialized. This is because `oid_pos()`, which is used by `oid_array_lookup()`, depends on `the_hash_algo->rawsz`. Ideally, we'd adapt `oid_pos()` to not depend on `the_hash_algo` anymore. That is a bigger untertaking though, so instead we fall back to SHA1 when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	fa9e009aa7	t/helper: use correct object hash in partial-clone helper The `object_info()` function of the partial-clone helper is responsible for checking the object ID of a repository other than `the_repository`. We use `parse_oid_hex()` in this function though, which means that we still depend on `the_repository->hash_algo`. Fix this by using the object hash of the function-local repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	2a0e11479f	compat/fsmonitor: fix socket path in networked SHA256 repos The IPC socket used by the fsmonitor on Darwin is usually contained in the Git repository itself. When the repository is hosted on a networked filesystem though, we instead create the socket path in the user's home directory or the socket directory. In that case, we derive the path by hashing the repository path. But while we always use SHA1 to hash the repository path, we then end up using `hash_to_hex()` to append the computed hash to the socket path. This is wrong because `hash_to_hex()` uses the hash algorithm configured in `the_repository`, which may not be SHA1. The consequence is that we may append uninitialized bytes to the path when operating in a SHA256 repository. Fix this bug by using `hash_to_hex_algop()` with SHA1. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	99cf4d6d35	replace-object: use hash algorithm from passed-in repository In `register_replace_ref()`, we pass in a repository but then use `get_oid_hex()` to parse passed-in object IDs, which implicitly uses `the_repository`. Fix this by using the hash algorithm from the passed-in repository instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	58650befd9	protocol-caps: use hash algorithm from passed-in repository In `send_info()`, we pass in a repository but then use `get_oid_hex()` to parse passed-in object IDs, which implicitly uses `the_repository`. Fix this by using the hash algorithm from the passed-in repository instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	f2c32a66f5	oidset: pass hash algorithm when parsing file The `oidset_parse_file_carefully()` function implicitly depends on `the_repository` when parsing object IDs. Fix this by having callers pass in the hash algorithm to use. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	afa2c6ddc8	http-fetch: don't crash when parsing packfile without a repo The git-http-fetch(1) command accepts a `--packfile=` option, which allows the user to specify that it shall fetch a specific packfile, only. The parameter here is the hash of the packfile, which is specific to the object hash used by the repository. This requirement is implicit though via our use of `parse_oid_hex()`, which internally uses `the_repository`. The git-http-fetch(1) command allows for there to be no repository though, which only exists such that we can show usage via the "-h" option. In that case though, starting with `c8aed5e8da` (repository: stop setting SHA1 as the default object hash, 2024-05-07), `the_repository` does not have its object hash initialized anymore and thus we would crash when trying to parse the object ID outside of a repository. Fix this issue by dying immediately when we see a "--packfile=" parameter when outside a Git repository. This is not a functional regression as we would die later on with the same error anyway. Add a test to detect the segfault. We use the "nongit" function to do so, which we need to allow-list in `test_must_fail ()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:34 -07:00
Patrick Steinhardt	8a676bdc5c	hash-ll: merge with "hash.h" The "hash-ll.h" header was introduced via `d1cbe1e6d8` (hash-ll.h: split out of hash.h to remove dependency on repository.h, 2023-04-22) to make explicit the split between hash-related functions that rely on the global `the_repository`, and those that don't. This split is no longer necessary now that we we have removed the reliance on `the_repository`. Merge "hash-ll.h" back into "hash.h". This causes some code units to not include "repository.h" anymore, which requires us to add some forward declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	36026a0f30	refs: avoid include cycle with "repository.h" There is an include cycle between "refs.h" and "repository.h" via "commit.h", "object.h" and "hash.h". This has the effect that several definitions of structs and enums will not be visible once we merge "hash-ll.h" back into "hash.h" in the next commit. The only reason that "repository.h" includes "refs.h" is the definition of `enum ref_storage_format`. Move it into "repository.h" and have "refs.h" include "repository.h" instead to fix the cycle. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	e7da938570	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	7abbca0e74	hash: require hash algorithm in `empty_tree_oid_hex()` The `empty_tree_oid_hex()` function use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. While at it, remove the unused `empty_blob_oid_hex()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	9c34eb93fb	hash: require hash algorithm in `is_empty_{blob,tree}_oid()` Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	861e8c76f6	hash: make `is_null_oid()` independent of `the_repository` The function `is_null_oid()` uses `oideq(oid, null_oid())` to check whether a given object ID is the all-zero object ID. `null_oid()` implicitly relies on `the_repository` though to return the correct null object ID. Get rid of this dependency by always comparing the complete hash array for being all-zeroes. This is possible due to the refactoring of object IDs so that their hash arrays are always fully initialized. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	d4d364b2c7	hash: convert `oidcmp()` and `oideq()` to compare whole hash With the preceding commit, the hash array of object IDs is now fully zero-padded even when the hash algorithm's output is smaller than the array length. With that, we can now adapt both `oidcmp()` and `oideq()` to unconditionally memcmp(3P) the whole array instead of depending on the hash size. While it may feel inefficient to compare unused bytes for e.g. SHA-1, in practice the compiler should now be able to produce code that is better optimized both because we have no branch anymore, but also because the size to compare is now known at compile time. Goldbolt spits out the following assembly on an x86_64 platform with GCC 14.1 for the old and new implementations of `oidcmp()`: oidcmp_old: movsx rax, DWORD PTR [rdi+32] test eax, eax jne .L2 mov rax, QWORD PTR the_repository[rip] cmp QWORD PTR [rax+16], 32 je .L6 .L4: mov edx, 20 jmp memcmp .L2: lea rdx, [rax+rax2] lea rax, [rax+rdx4] lea rax, hash_algos[0+rax8] cmp QWORD PTR [rax+16], 32 jne .L4 .L6: mov edx, 32 jmp memcmp oidcmp_new: mov edx, 32 jmp memcmp The new implementation gets ridi of all the branches and effectively only ends setting up `edx` for `memcmp()` and then calling it. And for `oideq()`: oideq_old: movsx rcx, DWORD PTR [rdi+32] mov rax, rdi mov rdx, rsi test ecx, ecx jne .L2 mov rcx, QWORD PTR the_repository[rip] cmp QWORD PTR [rcx+16], 32 mov rcx, QWORD PTR [rax] je .L12 .L4: mov rsi, QWORD PTR [rax+8] xor rcx, QWORD PTR [rdx] xor rsi, QWORD PTR [rdx+8] or rcx, rsi je .L13 .L8: mov eax, 1 test eax, eax sete al movzx eax, al ret .L2: lea rsi, [rcx+rcx2] lea rcx, [rcx+rsi4] lea rcx, hash_algos[0+rcx8] cmp QWORD PTR [rcx+16], 32 mov rcx, QWORD PTR [rax] jne .L4 .L12: mov rsi, QWORD PTR [rax+8] xor rcx, QWORD PTR [rdx] xor rsi, QWORD PTR [rdx+8] or rcx, rsi jne .L8 mov rcx, QWORD PTR [rax+16] mov rax, QWORD PTR [rax+24] xor rcx, QWORD PTR [rdx+16] xor rax, QWORD PTR [rdx+24] or rcx, rax jne .L8 xor eax, eax .L14: test eax, eax sete al movzx eax, al ret .L13: mov edi, DWORD PTR [rdx+16] cmp DWORD PTR [rax+16], edi jne .L8 xor eax, eax jmp .L14 oideq_new: mov rax, QWORD PTR [rdi] mov rdx, QWORD PTR [rdi+8] xor rax, QWORD PTR [rsi] xor rdx, QWORD PTR [rsi+8] or rax, rdx je .L5 .L2: mov eax, 1 xor eax, 1 ret .L5: mov rax, QWORD PTR [rdi+16] mov rdx, QWORD PTR [rdi+24] xor rax, QWORD PTR [rsi+16] xor rdx, QWORD PTR [rsi+24] or rax, rdx jne .L2 xor eax, eax xor eax, 1 ret Interestingly, the compiler decides to split the comparisons into two so that it first compares the lower half of the object ID for equality and then the upper half. If the first check shows a difference, then we wouldn't even end up comparing the second half. In both cases, the new generated code is significantly shorter and has way less branches. While I didn't benchmark the change, I'd be surprised if the new code was slower. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	c98d762ed9	global: ensure that object IDs are always padded The `oidcmp()` and `oideq()` functions only compare the prefix length as specified by the given hash algorithm. This mandates that the object IDs have a valid hash algorithm set, or otherwise we wouldn't be able to figure out that prefix. As we do not have a hash algorithm in many cases, for example when handling null object IDs, this assumption cannot always be fulfilled. We thus have a fallback in place that instead uses `the_repository` to derive the hash function. This implicit dependency is hidden away from callers and can be quite surprising, especially in contexts where there may be no repository. In theory, we can adapt those functions to always memcmp(3P) the whole length of their hash arrays. But there exist a couple of sites where we populate `struct object_id`s such that only the prefix of its hash that is actually used by the hash algorithm is populated. The remaining bytes are left uninitialized. The fact that those bytes are uninitialized also leads to warnings under Valgrind in some places where we copy those bytes. Refactor callsites where we populate object IDs to always initialize all bytes. This also allows us to get rid of `oidcpy_with_padding()`, for one because the input is now fully initialized, and because `oidcpy()` will now always copy the whole hash array. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	9da95bda74	hash: require hash algorithm in `oidread()` and `oidclr()` Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	f4836570a7	hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	129cb1b99d	hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except for a single callsite in "read-cache.c". Most callsites have long since been converted to use the equivalents that accept a `struct object_id` instead of a string. Adapt the remaining callsite and drop those functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Jeff King	aecd794fca	remote: drop checks for zero-url case Now that the previous commit removed the possibility that a "struct remote" will ever have zero url fields, we can drop a number of redundant checks and untriggerable code paths. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:39 -07:00
Jeff King	ffce821880	remote: always require at least one url in a remote When we return a struct from remote_get(), the result _almost_ always has at least one url. In remotes_remote_get_1(), we do this: if (name_given && !valid_remote(ret)) add_url_alias(remote_state, ret, name); if (!valid_remote(ret)) return NULL; So if the remote doesn't have a url, we give it one based on the name (this is how unconfigured urls are used as remotes). And if that doesn't work, we return NULL. But there's a catch: valid_remote() checks that we have at least one url _unless_ the remote.*.vcs field is set. This comes from `c578f51d52` (Add a config option for remotes to specify a foreign vcs, 2009-11-18), and the whole idea was to support remote helpers that don't have their own url. However, that mode has been broken since `25d5cc488a` (Pass unknown protocols to external protocol handlers, 2009-12-09)! That commit unconditionally looks at the url in get_helper(), causing a segfault with something like: git -c remote.foo.vcs=bar fetch foo We could fix that now, of course. But given that it has been broken for almost 15 years and nobody noticed, there's a better option. This weird "there might not be a url" special case requires checks all over the code base, and it's not clear if there are other similar segfaults lurking. It would be nice if we could drop that special case. So instead, let's let the "the remote name is the url" code kick in. If you have "remote.foo.vcs", then your url (unless otherwise configured) is "foo". This does have a visible effect compared to what `25d5cc488a` was trying to do. The idea back then is that for a remote without a url, we'd run: # only one command-line option! git-remote-bar foo whereas with our default url, now we'll run: git-remote-bar foo foo Again, in practice nobody can be relying on this because it has been segfaulting for 15 years. We should consider just removing this "vcs" config option entirely, but that would be a user-visible breakage. So by fixing it this way, we can keep things working that have been working, and simplify away one special case inside our code. This fixes the segfault from `25d5cc488a` (demonstrated by the test), and we can build further cleanups on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	7384e75618	t5801: test remote..vcs config The usual way to trigger a remote helper is to use the "::" syntax from: `87422439d1` (Allow specifying the remote helper in the url, 2009-11-18). Doing: git config remote.origin.url hg::https://example.com/repo will run "git-remote-hg origin https://example.com/repo". Or you can use the fallback handling from `25d5cc488a` (Pass unknown protocols to external protocol handlers, 2009-12-09): git config remote.origin.url "foo://bar" which will run "git-remote-foo origin foo://bar". But there's a third way, from `c578f51d52` (Add a config option for remotes to specify a foreign vcs, 2009-11-18): git config remote.origin.vcs foo git config remote.origin.url bar which will run "git-remote-foo origin bar". This is mostly redundant with the other methods, except that it is supposed to allow you to run without a URL at all. So: git config remote.origin.vcs foo would run "git-remote-foo origin" with no extra URL parameter (under the assumption that the helper somehow knows how to access the remote repo). However, this mode has been broken since `25d5cc488a`, shortly after it was added! That commit taught the transport code to always look at the URL string to parse off the "foo::" bits, meaning it would always segfault in the no-url case. You can see that with: git -c remote.foo.vcs=bar fetch foo Nobody seems to have noticed in the almost 15 years since, so presumably it's not a well-used feature. And without that, arguably the whole remote..vcs feature could be removed entirely, as it isn't offering anything you couldn't do with the "helper::" syntax. But it _does_ work if you have a URL, and it has been advertised in the documentation for all that time. So we shouldn't just remove it without warning. Likewise, even if we were going to deprecate it, we should avoid breaking it in the meantime. Since there are no tests for it at all, let's add a few basic ones: - this syntax doesn't work well with "git clone" (another point against it versus "helper::"). But we can use "clone -c" to set up the config manually, passing the URL as usual to clone. This does work, though note that I had to use --no-local in the test to avoid broken interactions between the local code and the helper. In the real world this would be a non-issue, since the remote URL would generally not also be a local Git repo! - likewise, we should be able to set up the config manually and fetch into a repository. This also works. - we can simulate a vcs that has no URL support by stuffing the remote path into another environment variable. This should work, but doesn't (it hits the segfault mentioned above). In the first two cases, I took the extra step of checking GIT_TRACE output to confirm that we actually ran the helper (since the URL is a valid Git repo, the clone/fetch would appear to work even if we didn't use the helper at all!). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	e2269a2b59	t5801: make remote-testgit GIT_DIR setup more robust Our tests use a fake helper that just imports from an existing Git repository. We're fed the path to that repo on the command line, and derive the GIT_DIR by tacking on "/.git". This is wrong if the path is a bare repository, but that's OK since this is just a limited test. But it's also wrong if the transport code feeds us the actual .git directory itself (i.e., we expect "/path/to/repo" but it gives us "/path/to/repo/.git"). None of the current tests do that, but let's future-proof ourselves against adding a test that does. We can instead ask "rev-parse" to set our GIT_DIR. Note that we have to first unset other git variables from our environment. Coming into this script, we'll have GIT_DIR set to the fetching repository, and we need to "switch" to the remote one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00
Jeff King	9badf97c42	remote: allow resetting url list Because remote.*.url is treated as a multi-valued key, there is no way to override previous config. So for example if you have remote.origin.url set to some wrong value, doing: git -c remote.origin.url=right fetch would not work. It would append "right" to the list, which means we'd still fetch from "wrong" (since subsequent values are used only as push urls). Let's provide a mechanism to reset the list, like we do for other multi-valued keys (e.g., credential.helper, http.extraheaders, and merge.suppressDest all use this "empty string means reset" pattern). Reported-by: Mathew George <mathewegeorge@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 09:34:38 -07:00

... 23 24 25 26 27 ...

75032 Commits