mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Junio C Hamano	b3e223ddda	Merge branch 'rs/i18n-cannot-be-used-together' Clean-up code that handles combinations of incompatible options. * rs/i18n-cannot-be-used-together: i18n: factorize even more 'incompatible options' messages	2023-12-18 14:10:12 -08:00
Junio C Hamano	468d49634f	Merge branch 'jb/reflog-expire-delete-dry-run-options' Command line parsing fix for "git reflog". * jb/reflog-expire-delete-dry-run-options: builtin/reflog.c: fix dry-run option short name	2023-12-18 14:10:12 -08:00
Junio C Hamano	ec5ab1482d	Merge branch 'js/update-urls-in-doc-and-comment' Stale URLs have been updated to their current counterparts (or archive.org) and HTTP links are replaced with working HTTPS links. * js/update-urls-in-doc-and-comment: doc: refer to internet archive doc: update links for andre-simon.de doc: switch links to https doc: update links to current pages	2023-12-18 14:10:12 -08:00
Junio C Hamano	66685e8555	Merge branch 'ps/commit-graph-less-paranoid' Earlier we stopped relying on commit-graph that (still) records information about commits that are lost from the object store, which has negative performance implications. The default has been flipped to disable this pessimization. * ps/commit-graph-less-paranoid: commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default	2023-12-18 14:10:11 -08:00
Junio C Hamano	02230b74e8	Merge branch 'cc/git-replay' Introduce "git replay", a tool meant on the server side without working tree to recreate a history. * cc/git-replay: replay: stop assuming replayed branches do not diverge replay: add --contained to rebase contained branches replay: add --advance or 'cherry-pick' mode replay: use standard revision ranges replay: make it a minimal server side command replay: remove HEAD related sanity check replay: remove progress and info output replay: add an important FIXME comment about gpg signing replay: change rev walking options replay: introduce pick_regular_commit() replay: die() instead of failing assert() replay: start using parse_options API replay: introduce new builtin t6429: remove switching aspects of fast-rebase	2023-12-18 14:10:11 -08:00
Junio C Hamano	3335365270	Merge branch 'ac/fuzz-show-date' Subject approxidate() and show_date() machinery to OSS-Fuzz. * ac/fuzz-show-date: fuzz: add new oss-fuzz fuzzer for date.c / date.h	2023-12-18 14:10:11 -08:00
Junio C Hamano	71c746632a	Merge branch 'ps/ref-deletion-updates' Simplify API implementation to delete references by eliminating duplication. * ps/ref-deletion-updates: refs: remove `delete_refs` callback from backends refs: deduplicate code to delete references refs/files: use transactions to delete references t5510: ensure that the packed-refs file needs locking	2023-12-18 14:10:11 -08:00
Junio C Hamano	f1c537705b	Merge branch 'js/packfile-h-typofix' Typofix. * js/packfile-h-typofix: packfile.c: fix a typo in `each_file_in_pack_dir_fn()`'s declaration	2023-12-18 14:10:10 -08:00
Jiang Xin	7033d5479b	pkt-line: do not chomp newlines for sideband messages When calling "packet_read_with_status()" to parse pkt-line encoded packets, we can turn on the flag "PACKET_READ_CHOMP_NEWLINE" to chomp newline character for each packet for better line matching. But when receiving data and progress information using sideband, we should turn off the flag "PACKET_READ_CHOMP_NEWLINE" to prevent mangling newline characters from data and progress information. When both the server and the client support "sideband-all" capability, we have a dilemma that newline characters in negotiation packets should be removed, but the newline characters in the progress information should be left intact. Add new flag "PACKET_READ_USE_SIDEBAND" for "packet_read_with_status()" to prevent mangling newline characters in sideband messages. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 13:24:38 -08:00
Jiang Xin	64220dc5f7	pkt-line: memorize sideband fragment in reader When we turn on the "use_sideband" field of the packet_reader, "packet_reader_read()" will call the function "demultiplex_sideband()" to parse and consume sideband messages. Sideband fragment which does not end with "\r" or "\n" will be saved in the sixth parameter "scratch" and it can be reused and be concatenated when parsing another sideband message. In "packet_reader_read()" function, the local variable "scratch" can only be reused by subsequent sideband messages. But if there is a payload message between two sideband fragments, the first fragment which is saved in the local variable "scratch" will be lost. To solve this problem, we can add a new field "scratch" in packet_reader to memorize the sideband fragment across different calls of "packet_reader_read()". Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 13:24:37 -08:00
Jiang Xin	eaa82f8e98	test-pkt-line: add option parser for unpack-sideband We can use the test helper program "test-tool pkt-line" to test pkt-line related functions. E.g.: * Use "test-tool pkt-line send-split-sideband" to generate sideband messages. * Pipe these generated sideband messages to command "test-tool pkt-line unpack-sideband" to test packet_reader_read() function. In order to make a complete test of the packet_reader_read() function, add option parser for command "test-tool pkt-line unpack-sideband". * To remove newlines in sideband messages, we can use: $ test-tool pkt-line unpack-sideband --chomp-newline * To preserve newlines in sideband messages, we can use: $ test-tool pkt-line unpack-sideband --no-chomp-newline * To parse sideband messages using "demultiplex_sideband()" inside the function "packet_reader_read()", we can use: $ test-tool pkt-line unpack-sideband --reader-use-sideband We also add new example sideband packets in send_split_sideband() and add several new test cases in t0070. Among these test cases, we pipe output of the "send-split-sideband" subcommand to the "unpack-sideband" subcommand. We found two issues: 1. The two splitted sideband messages "Hello," and " world!\n" should be concatenated together. But when we turn on use_sideband field of reader to parse sideband messages, the first part of the splitted message ("Hello,") is lost. 2. The newline characters in sideband 2 (progress info) and sideband 3 (error message) should be preserved, but they are both trimmed. Will fix the above two issues in subsequent commits. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 13:24:37 -08:00
Junio C Hamano	6d6f1cd7ee	doc: format.notes specify a ref under refs/notes/ hierarchy There is no 'ref/notes/' hierarchy. '[format] notes = foo' uses notes that are found in 'refs/notes/foo'. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 11:30:46 -08:00
Shreyansh Paliwal	37e8d795be	test-lib-functions.sh: fix test_grep fail message wording In the recent commit `2e87fca189` (test framework: further deprecate test_i18ngrep, 2023-10-31), the test_i18ngrep function was deprecated, and all the callers were updated to call the test_grep function instead. But test_grep inherited an error message that still refers to test_i18ngrep by mistake. Correct it so that a broken call to the test_grep will identify itself as such. Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 10:44:41 -08:00
René Scharfe	8277dbe987	git-compat-util: convert skip_{prefix,suffix}{,_mem} to bool Use the data type bool and its values true and false to document the binary return value of skip_prefix() and friends more explicitly. This first use of stdbool.h, introduced with C99, is meant to check whether there are platforms that claim support for C99, as tested by `7bc341e21b` (git-compat-util: add a test balloon for C99 support, 2021-12-01), but still lack that header for some reason. A fallback based on a wider type, e.g. int, would have to deal with comparisons somehow to emulate that any non-zero value is true: bool b1 = 1; bool b2 = 2; if (b1 == b2) puts("This is true."); int i1 = 1; int i2 = 2; if (i1 == i2) puts("Not printed."); #define BOOLEQ(a, b) (!(a) == !(b)) if (BOOLEQ(i1, i2)) puts("This is true."); So we'd be better off using bool everywhere without a fallback, if possible. That's why this patch doesn't include any. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 09:08:24 -08:00
Jiang Xin	18ce48918c	fetch: no redundant error message for atomic fetch If an error occurs during an atomic fetch, a redundant error message will appear at the end of do_fetch(). It was introduced in `b3a804663c` (fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17). Because a failure message is displayed before setting retcode in the function do_fetch(), calling error() on the err message at the end of this function may result in redundant or empty error message to be displayed. We can remove the redundant error() function, because we know that the function ref_transaction_abort() never fails. While we can find a common pattern for calling ref_transaction_abort() by running command "git grep -A1 ref_transaction_abort", e.g.: if (ref_transaction_abort(transaction, &error)) error("abort: %s", error.buf); Following this pattern, we can tolerate the return value of the function ref_transaction_abort() being changed in the future. We also delay the output of the err message to the end of do_fetch() to reduce redundant code. With these changes, the test case "fetch porcelain output (atomic)" in t5574 will also be fixed. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 08:30:33 -08:00
Jiang Xin	97d82b2963	t5574: test porcelain output of atomic fetch The test case "fetch porcelain output" checks output of the fetch command. The error output must be empty with the follow assertion: test_must_be_empty stderr But this assertion fails if using atomic fetch. Refactor this test case to use different fetch options by splitting it into three test cases. 1. "setup for fetch porcelain output". 2. "fetch porcelain output": for non-atomic fetch. 3. "fetch porcelain output (atomic)": for atomic fetch. Add new command "test_commit ..." in the first test case, so that if we run these test cases individually (--run=4-6), "git rev-parse HEAD~" command will work properly. Run the above test cases, we can find that one test case has a known breakage, as shown below: ok 4 - setup for fetch porcelain output ok 5 - fetch porcelain output # TODO known breakage vanished not ok 6 - fetch porcelain output (atomic) # TODO known breakage The failed test case has an error message with only the error prompt but no message body, as follows: 'stderr' is not empty, it contains: error: In a later commit, we will fix this issue. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-18 08:30:32 -08:00
Junio C Hamano	bc62d27d5c	docs: MERGE_AUTOSTASH is not that special A handful of manual pages called MERGE_AUTOSTASH a "special ref", but there is nothing special about it. It merely is yet another pseudoref. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 14:08:28 -08:00
Junio C Hamano	dada38646a	docs: AUTO_MERGE is not that special A handful of manual pages called AUTO_MERGE a "special ref", but there is nothing special about it. It merely is yet another pseudoref. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 14:08:28 -08:00
Junio C Hamano	7122f4f747	refs.h: HEAD is not that special In-code comment explains pseudorefs but used a wrong nomenclature "special ref". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 14:08:28 -08:00
Junio C Hamano	2047b2c28c	git-bisect.txt: BISECT_HEAD is not that special The description of "git bisect --no-checkout" called BISECT_HEAD a "special ref", but there is nothing special about it. It merely is yet another pseudoref. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 14:08:28 -08:00
Junio C Hamano	d9a4bb3385	git.txt: HEAD is not that special The introductory text in "git help git" that describes HEAD called it "a special ref". It is special compared to the more regular refs like refs/heads/master and refs/tags/v1.0.0, but not that special, unlike truly special ones like FETCH_HEAD. Rewrite a few sentences to also introduce the distinction between a regular ref that contain the object name and a symbolic ref that contain the name of another ref. Update the description of HEAD that point at the current branch to use the more correct term, a "symbolic ref". This was found as part of auditing the documentation and in-code comments for uses of "special ref" that refer merely a "pseudo ref". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 14:08:21 -08:00
Eric Sunshine	68fcebfb1a	git-add.txt: add missing short option -A to synopsis With one exception, the synopsis for `git add` consistently lists the short counterpart alongside the long-form of each option (for instance, "[--edit \| -e]"). The exception is that -A is not mentioned alongside --all. Fix this inconsistency Reported-by: Benjamin Lehmann <ben.lehmann@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 13:01:51 -08:00
Patrick Steinhardt	647b5e0998	tests: adjust whitespace in chainlint expectations The "check-chainlint" target runs automatically when running tests and performs self-checks to verify that the chainlinter itself produces the expected output. Originally, the chainlinter was implemented via sed, but the infrastructure has been rewritten in `fb41727b7e` (t: retire unused chainlint.sed, 2022-09-01) to use a Perl script instead. The rewrite caused some slight whitespace changes in the output that are ultimately not of much importance. In order to be able to assert that the actual chainlinter errors match our expectations we thus have to ignore whitespace characters when diffing them. As the `-w` flag is not in POSIX we try to use `git diff -w --no-index` before we fall back to `diff -w -u`. To accomodate for cases where the host system has no Git installation we use the locally-compiled version of Git. This can result in problems though when the Git project's repository is using extensions that the locally-compiled version of Git doesn't understand. It will refuse to run and thus cause the checks to fail. Instead of improving the detection logic, fix our ".expect" files so that we do not need any post-processing at all anymore. This allows us to drop the `-w` flag when diffing so that we can always use diff(1) now. Note that we keep some of the post-processing of `chainlint.pl` output intact to strip leading line numbers generated by the script. Having these would cause a rippling effect whenever we add a new test that sorts into the middle of existing tests and would require us to renumerate all subsequent lines, which seems rather pointless. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-15 08:36:14 -08:00
Taylor Blau	ba47d88795	t/perf: add performance tests for multi-pack reuse To ensure that we don't regress either the size or runtime performance of multi-pack reuse, add a performance test to measure both of these. The test partitions the objects in GIT_TEST_PERF_LARGE_REPO into 1, 10, and 100 packs, and then tries to perform a "clone" at each stage with both single- and multi-pack reuse enabled. Note that the `repack_into_n_chunks()` function in this new test script differs from the existing `repack_into_n()`. The former partitions the repository into N equal-sized chunks, while the latter produces N packs of five commits each (plus their objects), and then another pack with the remainder. On git.git, I can produce the following results on my machine: Test this tree -------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 1.57(2.99+0.15) 5332.4: clone size for 1-pack scenario (single-pack reuse) 231.8M 5332.5: clone for 1-pack scenario (multi-pack reuse) 1.79(2.96+0.21) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 231.7M 5332.9: clone for 10-pack scenario (single-pack reuse) 3.89(16.75+0.35) 5332.10: clone size for 10-pack scenario (single-pack reuse) 209.9M 5332.11: clone for 10-pack scenario (multi-pack reuse) 1.56(2.99+0.17) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 224.4M 5332.15: clone for 100-pack scenario (single-pack reuse) 8.24(54.31+0.59) 5332.16: clone size for 100-pack scenario (single-pack reuse) 278.3M 5332.17: clone for 100-pack scenario (multi-pack reuse) 2.13(2.44+0.33) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 357.9M Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	af626ac0e0	pack-bitmap: enable reuse from all bitmapped packs Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple bitmapped packs for verbatim reuse, allow marking objects from all bitmapped packs as eligible for reuse. Within the `reuse_partial_packfile_from_bitmap()` function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack contained in the MIDX as a reuse candidate. Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	941074134c	pack-objects: allow setting `pack.allowPackReuse` to "single" In `e704fc7978` (pack-objects: introduce pack.allowPackReuse, 2019-12-18), the `pack.allowPackReuse` configuration option was introduced, allowing users to disable the pack reuse mechanism. To prepare for debugging multi-pack reuse, allow setting configuration to "single" in addition to the usual bool-or-int values. "single" implies the same behavior as "true", "1", "yes", and so on. But it will complement a new "multi" value (to be introduced in a future commit). When set to "single", we will only perform pack reuse on a single pack, regardless of whether or not there are multiple MIDX'd packs. This requires no code changes (yet), since we only support single pack reuse. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	3bea0c0611	t/test-lib-functions.sh: implement `test_trace2_data` helper Introduce a helper function which looks for a specific (category, key, value) tuple in the output of a trace2 event stream. We will use this function in a future patch to ensure that the expected number of objects are reused from an expected number of packs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	54393e4e68	pack-objects: add tracing for various packfile metrics As part of the multi-pack reuse effort, we will want to add some tests that assert that we reused a certain number of objects from a certain number of packs. We could do this by grepping through the stderr output of `pack-objects`, but doing so would be brittle in case the output format changed. Instead, let's use the trace2 mechanism to log various pieces of information about the generated packfile, which we can then use to compare against desired values. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	519e17ff75	pack-bitmap: prepare to mark objects from multiple packs for reuse Now that the pack-objects code is equipped to handle reusing objects from multiple packs, prepare the pack-bitmap code to mark objects from multiple packs as reuse candidates. In order to prepare the pack-bitmap code for this change, remove the same set of assumptions we unwound in previous commits from the helper function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it to be called in a loop over the set of bitmapped packs in a following commit. Most importantly, we can no longer assume that the bit position corresponding to the first object in a given reuse pack candidate is at the beginning of the bitmap itself. For the single pack that this assumption is still true for (in MIDX bitmaps, this is the preferred pack, in single-pack bitmaps it is the pack the bitmap is tied to), we can still use our whole-words optimization. But for all subsequent packs, we can not make use of this optimization, since it assumes that all delta bases are being sent from the same pack, which would break if we are sending OFS_DELTAs down to the client. To understand why, consider two packs, P1 and P2 where: - P1 has object A which is a delta on base B - P2 has its own copy of B, in addition to other objects Suppose that the MIDX which covers P1 and P2 selected its copy of A from P1, but selected its copy of B from P2. Since A is a delta of B, but the base was selected from a different pack, sending the bytes corresponding to A as an OFS_DELTA verbatim from P1 would be incorrect, since we don't guarantee that B is in the same place relative to A in the generated pack as in P1. For now, we detect and reject these cross-pack deltas by searching for the (pack_id, offset) pair for the delta's base object (using the same pack_id as the pack containing the delta'd object) in the MIDX. If we find a match, that means that the MIDX did indeed pick the base object from the same pack, and we are OK to reuse the delta. If we don't find a match, however, that means that the base object was selected from a different pack in the MIDX, and we can let the slower path handle re-delta'ing our candidate object. In the future, there are a couple of other things we could do, namely: - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into REF_DELTAs. We already do this today when reusing an OFS_DELTA without `--delta-base-offset` enabled, so it's not a huge stretch to do the same for cross-pack deltas even when `--delta-base-offset` is enabled. This would work, but would obviously result in larger-than-necessary packs, as we in theory could represent these cross-pack deltas by patching an existing OFS_DELTA. But it's not clear how much that would matter in practice. I suspect it would have a lot to do with how you pack your repository in the first place. - Finally, we could patch OFS_DELTAs across packs in a similar fashion as we do today for OFS_DELTAs within a single pack on either side of a gap. This would result in the smallest packs of the three options here, but implementing this would be more involved. At minimum, you'd have to keep the reusable chunks list for all reused packs, not just the one we're currently processing. And you'd have to ensure that any bases which are a part of cross-pack deltas appear before the delta. I think this is possible to do, but would require assembling the reusable chunks list potentially in a different order than they appear in the source packs. For now, let's pursue the simplest approach and reject any cross-pack deltas. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Taylor Blau	dbd5c520d2	pack-revindex: implement `midx_pair_to_pack_pos()` Now that we have extracted the `midx_key_to_pack_pos()` function, we can implement the `midx_pair_to_pack_pos()` function which accepts (pack_id, offset) tuples and returns an index into the psuedo-pack order. This will be used in a following commit in order to figure out whether or not the MIDX chose a given delta's base object from the same pack as the delta resides in. It will do so by locating the base object's offset in the pack, and then performing a binary search using the same pack ID with the base object's offset. If (and only if) it finds a match (at any position) we can guarantee that the MIDX selected both halves of the delta/base pair from the same pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	e1bfe30c4d	pack-revindex: factor out `midx_key_to_pack_pos()` helper The `midx_to_pack_pos()` function implements a binary search over objects in the MIDX between lexical and pseudo-pack order. It does this by taking in an index into the lexical order (i.e. the same argument you'd use for `nth_midxed_object_id()` and similar) and spits out a position in the pseudo-pack order. This works for all callers, since they currently all are translating from lexical order to pseudo-pack order. But future callers may want to translate a known (offset, pack_id) tuple into an index into the psuedo-pack order, without knowing where that (offset, pack_id) tuple appears in lexical order. Prepare for implementing a function that translates between a (offset, pack_id) tuple into an index into the psuedo-pack order by extracting a helper function which does just that, and then reimplementing midx_to_pack_pos() in terms of it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	b1e3333068	midx: implement `midx_preferred_pack()` When performing a binary search over the objects in a MIDX's bitmap (i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack ordering using a combination of (a) the preferred pack, (b) the pack's lexical position in the MIDX based on pack names, and (c) the object offset within the pack. In order to perform this binary search, the reader must know the identity of the preferred pack. This could be stored in the MIDX, but isn't for historical reasons, mostly because it can easily be inferred at read-time by looking at the object in the first bit position and finding out which pack it was selected from in the MIDX, like so: nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); In midx_to_pack_pos() which performs this binary search, we look up the identity of the preferred pack before each search. This is relatively quick, since it involves two table-driven lookups (one in the MIDX's revindex for `pack_pos_to_midx()`, and another in the MIDX's object table for `nth_midxed_pack_int_id()`). But since the preferred pack does not change after the MIDX is written, it is safe to cache this value on the MIDX itself. Write a helper to do just that, and rewrite all of the existing call-sites that care about the identity of the preferred pack in terms of this new helper. This will prepare us for a subsequent patch where we will need to binary search through the MIDX's pseudo-pack order multiple times. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	ed9f41480a	git-compat-util.h: implement checked size_t to uint32_t conversion In a similar fashion as other checked cast functions in this header (such as `cast_size_t_to_ulong()` and `cast_size_t_to_int()`), implement a checked cast function for going from a size_t to a uint32_t value. This function will be utilized in a future commit which needs to make such a conversion. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	b96289a10b	pack-objects: include number of packs reused in output In addition to including the number of objects reused verbatim from a reuse-pack, include the number of packs from which objects were reused. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	ca0fd69e37	pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse The function `write_reused_pack_verbatim()` within `builtin/pack-objects.c` is responsible for writing out a continuous set of objects beginning at the start of the reuse packfile. In the existing implementation, we did something like: while (pos < reuse_packfile_bitmap->word_alloc && reuse_packfile_bitmap->words[pos] == (eword_t)~0) pos++; if (pos) /* write first `pos * BITS_IN_WORD` objects from pack */ as an optimization to record a single chunk for the longest continuous prefix of objects wanted out of the reuse pack, instead of having a chunk for each individual object. For more details, see `bb514de356` (pack-objects: improve partial packfile reuse, 2019-12-18). In order to retain this optimization in a multi-pack reuse world, we can no longer assume that the first object in a pack is on a word boundary in the bitmap storing the set of reusable objects. Assuming that all objects from the beginning of the reuse packfile up to the object corresponding to the first bit on a word boundary are part of the result, consume whole words at a time until the last whole word belonging to the reuse packfile. Copy those objects to the resulting packfile, and track that we reused them by recording a single chunk. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	4805125710	pack-objects: prepare `write_reused_pack()` for multi-pack reuse The function `write_reused_pack()` within `builtin/pack-objects.c` is responsible for performing pack-reuse on a single pack, and has two main functions: - it dispatches a call to `write_reused_pack_verbatim()` to see if we can reuse portions of the packfile in whole-word chunks - for any remaining objects (that is, any objects that appear after the first "gap" in the bitmap), call write_reused_pack_one() on that object to record it for reuse. Prepare this function for multi-pack reuse by removing the assumption that the bit position corresponding to the first object being reused from a given pack must be at bit position zero. The changes in this function are mostly straightforward. Initialize `i` to the position of the first word to contain bits corresponding to that reuse pack. In most situations, we throw the initialized value away, since we end up replacing it with the return value from write_reused_pack_verbatim(), moving us past the section of whole words that we reused. Likewise, modify the per-object loop to ignore any bits at the beginning of the first word that do not belong to the pack currently being reused, as well as skip to the "done" section once we have processed the last bit corresponding to this pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	073b40eba0	pack-objects: pass `bitmapped_pack`'s to pack-reuse functions Further prepare pack-objects to perform verbatim pack-reuse over multiple packfiles by converting functions that take in a pointer to a `struct packed_git` to instead take in a pointer to a `struct bitmapped_pack`. The additional information found in the bitmapped_pack struct (such as the bit position corresponding to the beginning of the pack) will be necessary in order to perform verbatim pack-reuse. Note that we don't use any of the extra pieces of information contained in the bitmapped_pack struct, so this step is merely preparatory and does not introduce any functional changes. Note further that we do not change the argument type to write_reused_pack_one(). That function is responsible for copying sections of the packfile directly and optionally patching any OFS_DELTAs to account for not reusing sections of the packfile in between a delta and its base. As such, that function is (and should remain) oblivious to multi-pack reuse, and does not require any of the extra pieces of information stored in the bitmapped_pack struct. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	d1d701eb9c	pack-objects: keep track of `pack_start` for each reuse pack When reusing objects from a pack, we keep track of a set of one or more `reused_chunk`s, corresponding to sections of one or more object(s) from a source pack that we are reusing. Each chunk contains two pieces of information: - the offset of the first object in the source pack (relative to the beginning of the source pack) - the difference between that offset, and the corresponding offset in the pack we're generating The purpose of keeping track of these is so that we can patch an OFS_DELTAs that cross over a section of the reuse pack that we didn't take. For instance, consider a hypothetical pack as shown below: (chunk #2) __________... / / +--------+---------+-------------------+---------+ ... \| <base> \| <other> \| (unused) \| <delta> \| ... +--------+---------+-------------------+---------+ \ / \______________/ (chunk #1) Suppose that we are sending objects "base", "other", and "delta", and that the "delta" object is stored as an OFS_DELTA, and that its base is "base". If we don't send any objects in the "(unused)" range, we can't copy the delta'd object directly, since its delta offset includes a range of the pack that we didn't copy, so we have to account for that difference when patching and reassembling the delta. In order to compute this value correctly, we need to know not only where we are in the packfile we're assembling (with `hashfile_total(f)`) but also the position of the first byte of the packfile that we are currently reusing. Currently, this works just fine, since when reusing only a single pack those two values are always identical (because verbatim reuse is the first thing pack-objects does when enabled after writing the pack header). But when reusing multiple packs which have one or more gaps, we'll need to account for these two values diverging. Together, these two allow us to compute the reused chunk's offset difference relative to the start of the reused pack, as desired. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	5e29c3f707	pack-objects: parameterize pack-reuse routines over a single pack The routines pack-objects uses to perform verbatim pack-reuse are: - write_reused_pack_one() - write_reused_pack_verbatim() - write_reused_pack() , all of which assume that there is exactly one packfile being reused: the global constant `reuse_packfile`. Prepare for reusing objects from multiple packs by making reuse packfile a parameter of each of the above functions in preparation for calling these functions in a loop with multiple packfiles. Note that we still have the global "reuse_packfile", but pass it through each of the above function's parameter lists, eliminating all but one direct access (the top-level caller in `write_pack_file()`). Even after this series, we will still have a global, but it will hold the array of reusable packfiles, and we'll pass them one at a time to these functions in a loop. Note also that we will eventually need to pass a `bitmapped_pack` instead of a `packed_git` in order to hold onto additional information required for reuse (such as the bit position of the first object belonging to that pack). But that change will be made in a future commit so as to minimize the noise below as much as possible. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	83296d20e8	pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` Further prepare for enabling verbatim pack-reuse over multiple packfiles by changing the signature of reuse_partial_packfile_from_bitmap() to populate an array of `struct bitmapped_pack `'s instead of a pointer to a single packfile. Since the array we're filling out is sized dynamically[^1], add an additional `size_t ` parameter which will hold the number of reusable packs (equal to the number of elements in the array). Note that since we still have not implemented true multi-pack reuse, these changes aren't propagated out to the rest of the caller in builtin/pack-objects.c. In the interim state, we expect that the array has a single element, and we use that element to fill out the static `reuse_packfile` variable (which is a bog-standard `struct packed_git *`). Future commits will continue to push this change further out through the pack-objects code. [^1]: That is, even though we know the number of packs which are candidates for pack-reuse, we do not know how many of those candidates we can actually reuse. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	35e156b9de	pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature The signature of `reuse_partial_packfile_from_bitmap()` currently takes in a bitmap, as well as three output parameters (filled through pointers, and passed as arguments), and also returns an integer result. The output parameters are filled out with: (a) the packfile used for pack-reuse, (b) the number of objects from that pack that we can reuse, and (c) a bitmap indicating which objects we can reuse. The return value is either -1 (when there are no objects to reuse), or 0 (when there is at least one object to reuse). Some of these parameters are redundant. Notably, we can infer from the bitmap how many objects are reused by calling bitmap_popcount(). And we can similar compute the return value based on that number as well. As such, clean up the signature of this function to drop the "*entries" parameter, as well as the int return value, since the single caller of this function can infer these values themself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:08 -08:00
Taylor Blau	e5d48bf38b	ewah: implement `bitmap_is_empty()` In a future commit, we will want to check whether or not a bitmap has any bits set in any of its words. The best way to do this (prior to the existence of this patch) is to call `bitmap_popcount()` and check whether the result is non-zero. But this is semi-wasteful, since we do not need to know the exact number of bits set, only whether or not there is at least one of them. Implement a new helper function to check just that. Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	dab60934e3	pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions When trying to assemble a pack with bitmaps using `--use-bitmap-index`, `pack-objects` asks the pack-bitmap machinery for a bitmap which indicates the set of objects we can "reuse" verbatim from on-disk. This set is roughly comprised of: a prefix of objects in the bitmapped pack (or preferred pack, in the case of a multi-pack reachability bitmap), plus any other objects not included in the prefix, excluding any deltas whose base we are not sending in the resulting pack. The pack-bitmap machinery is responsible for computing this bitmap, and does so with the following functions: - reuse_partial_packfile_from_bitmap() - try_partial_reuse() In the existing implementation, the first function is responsible for (a) marking the prefix of objects in the reusable pack, and then (b) calling try_partial_reuse() on any remaining objects to ensure that they are also reusable (and removing them from the bitmapped set if they are not). Likewise, the `try_partial_reuse()` function is responsible for checking whether an isolated object (that is, an object from the bitmapped pack/preferred pack not contained in the prefix from earlier) may be reused, i.e. that it isn't a delta of an object that we are not sending in the resulting pack. These functions are based on two core assumptions, which we will unwind in this and the following commits: 1. There is only a single pack from the bitmap which is eligible for verbatim pack-reuse. For single-pack bitmaps, this is trivially the bitmapped pack. For multi-pack bitmaps, this is (currently) the MIDX's preferred pack. 2. The pack eligible for reuse has its first object in bit position 0, and all objects from that pack follow in pack-order from that first bit position. In order to perform verbatim pack reuse over multiple packs, we must unwind these two assumptions. Most notably, in order to reuse bits from a given packfile, we need to know the first bit position occupied by an object form that packfile. To propagate this information around, pass a `struct bitmapped_pack ` anywhere we previously passed a `struct packed_git `, since the former contains the bitmap position we're interested in (as well as a pointer to the latter). As an additional step, factor out a sub-routine from the main `reuse_partial_packfile_from_bitmap()` function, called `reuse_partial_packfile_from_bitmap_1()`. This new function will be responsible for figuring out which objects may be reused from a single pack, and the existing function will dispatch multiple calls to its new helper function for each reusable pack. Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain an array of reusable packs instead of a single such pack. We currently expect that array to have only a single element, so this awkward state is short-lived. It will serve as useful scaffolding in subsequent commits as we begin to work towards enabling multi-pack reuse. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	307d75bbe6	midx: implement `midx_locate_pack()` The multi-pack index API exposes a `midx_contains_pack()` function that takes in a string ending in either ".idx" or ".pack" and returns whether or not the MIDX contains a given pack corresponding to that string. There is no corresponding function to locate the position of a pack within the MIDX's pack order (sorted lexically by pack filename). We could add an optional out parameter to `midx_contains_pack()` that is filled out with the pack's position when the parameter is non-NULL. To minimize the amount of fallout from this change, instead introduce a new function by renaming `midx_contains_pack()` to `midx_locate_pack()`, adding that output parameter, and then reimplementing `midx_contains_pack()` in terms of it. Future patches will make use of this new function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	5f5ccd9595	midx: implement `BTMP` chunk When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	fba68184b8	midx: factor out `fill_pack_info()` When selecting which packfiles will be written while generating a MIDX, the MIDX internals fill out a 'struct pack_info' with various pieces of book-keeping. Instead of filling out each field of the `pack_info` structure individually in each of the two spots that modify the array of such structures (`ctx->info`), extract a common routine that does this for us. This reduces the code duplication by a modest amount. But more importantly, it zero-initializes the structure before assigning values into it. This hardens us for a future change which will add additional fields to this structure which (until this patch) was not zero-initialized. As a result, any new fields added to the `pack_info` structure need only be updated in a single location, instead of at each spot within midx.c. There are no functional changes in this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	a96015a517	pack-bitmap: plug leak in find_objects() The `find_objects()` function creates an object_list for any tips of the reachability query which do not have corresponding bitmaps. The object_list is not used outside of `find_objects()`, but we never free it with `object_list_free()`, resulting in a leak. Let's plug that leak by calling `object_list_free()`, which results in t6113 becoming leak-free. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	6cdb67b97d	pack-bitmap-write: deep-clear the `bb_commit` slab The `bb_commit` commit slab is used by the pack-bitmap-write machinery to track various pieces of bookkeeping used to generate reachability bitmaps. Even though we clear the slab when freeing the bitmap_builder struct (with `bitmap_builder_clear()`), there are still pointers which point to locations in memory that have not yet been freed, resulting in a leak. Plug the leak by introducing a suitable `free_fn` for the `struct bb_commit` type, and make sure it is called on each member of the slab via the `deep_clear_bb_data()` function. Note that it is possible for both of the arguments to `bitmap_free()` to be NULL, but `bitmap_free()` is a noop for NULL arguments, so it is OK to pass them unconditionally. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Taylor Blau	66f0c71073	pack-objects: free packing_data in more places The pack-objects internals use a packing_data struct to track what objects are part of the pack(s) being formed. Since these structures contain allocated fields, failing to appropriately free() them results in a leak. Plug that leak by introducing a clear_packing_data() function, and call it in the appropriate spots. This is a fairly straightforward leak to plug, since none of the callers expect to read any values or have any references to parts of the address space being freed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:07 -08:00
Jeff King	dee182941f	mailinfo: avoid recursion when unquoting From headers Our unquote_comment() function is recursive; when it sees a comment within a comment, like: (this is an (embedded) comment) it recurses to handle the inner comment. This is fine for practical use, but it does mean that you can easily run out of stack space with a malicious header. For example: perl -e 'print "From: ", "(" x 2**18;' \| git mailinfo /dev/null /dev/null segfaults on my system. And since mailinfo is likely to be fed untrusted input from the Internet (if not by human users, who might recognize a garbage header, but certainly there are automated systems that apply patches from a list) it may be possible for an attacker to trigger the problem. That said, I don't think there's an interesting security vulnerability here. All an attacker can do is make it impossible to parse their email and apply their patch, and there are lots of ways to generate bogus emails. So it's more of an annoyance than anything. But it's pretty easy to fix it. The recursion is not helping us preserve any particular state from each level. The only flag in our parsing is take_next_literally, and we can never recurse when it is set (since the start of a new comment implies it was not backslash-escaped). So it is really only useful for finding the end of the matched pair of parentheses. We can do that easily with a simple depth counter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:33:52 -08:00

1 2 3 4 5 ...

71893 Commits