Comparing 5f9f86a2dc..1dd64c2d05 - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Johannes Schindelin	0bbcf95194	Git 2.31.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2023-02-06 09:24:07 +01:00
Johannes Schindelin	e14d6b8408	Sync with 2.30.8 * maint-2.30: Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport	2023-02-06 09:24:06 +01:00
Junio C Hamano	394a759d2b	Git 2.30.8 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-06 09:14:45 +01:00
Junio C Hamano	a3033a68ac	Merge branch 'ps/apply-beyond-symlink' into maint-2.30 Fix a vulnerability (CVE-2023-23946) that allows crafted input to trick `git apply` into writing files outside of the working tree. * ps/apply-beyond-symlink: dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2023-02-06 09:12:16 +01:00
Taylor Blau	2c9a4c7310	Merge branch 'tb/clone-local-symlinks' into maint-2.30 Resolve a security vulnerability (CVE-2023-22490) where `clone_local()` is used in conjunction with non-local transports, leading to arbitrary path exfiltration. * tb/clone-local-symlinks: dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport	2023-02-06 09:09:14 +01:00
Patrick Steinhardt	fade728df1	apply: fix writing behind newly created symbolic links When writing files git-apply(1) initially makes sure that none of the files it is about to create are behind a symlink: ``` $ git init repo Initialized empty Git repository in /tmp/repo/.git/ $ cd repo/ $ ln -s dir symlink $ git apply - <<EOF diff --git a/symlink/file b/symlink/file new file mode 100644 index 0000000..e69de29 EOF error: affected file 'symlink/file' is beyond a symbolic link ``` This safety mechanism is crucial to ensure that we don't write outside of the repository's working directory. It can be fooled though when the patch that is being applied creates the symbolic link in the first place, which can lead to writing files in arbitrary locations. Fix this by checking whether the path we're about to create is beyond a symlink or not. Tightening these checks like this should be fine as we already have these precautions in Git as explained above. Ideally, we should update the check we do up-front before starting to reflect the computed changes to the working tree so that we catch this case as well, but as part of embargoed security work, adding an equivalent check just before we try to write out a file should serve us well as a reasonable first step. Digging back into history shows that this vulnerability has existed since at least Git v2.9.0. As Git v2.8.0 and older don't build on my system anymore I cannot tell whether older versions are affected, as well. Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-03 14:41:31 -08:00
Taylor Blau	bffc762f87	dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS When using the dir_iterator API, we first stat(2) the base path, and then use that as a starting point to enumerate the directory's contents. If the directory contains symbolic links, we will immediately die() upon encountering them without the `FOLLOW_SYMLINKS` flag. The same is not true when resolving the top-level directory, though. As explained in a previous commit, this oversight in `6f054f9fb3` (builtin/clone.c: disallow `--local` clones with symlinks, 2022-07-28) can be used as an attack vector to include arbitrary files on a victim's filesystem from outside of the repository. Prevent resolving top-level symlinks unless the FOLLOW_SYMLINKS flag is given, which will cause clones of a repository with a symlink'd "$GIT_DIR/objects" directory to fail. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-24 16:52:16 -08:00
Taylor Blau	cf8f6ce02a	clone: delay picking a transport until after get_repo_path() In the previous commit, t5619 demonstrates an issue where two calls to `get_repo_path()` could trick Git into using its local clone mechanism in conjunction with a non-local transport. That sequence is: - the starting state is that the local path https:/example.com/foo is a symlink that points to ../../../.git/modules/foo. So it's dangling. - get_repo_path() sees that no such path exists (because it's dangling), and thus we do not canonicalize it into an absolute path - because we're using --separate-git-dir, we create .git/modules/foo. Now our symlink is no longer dangling! - we pass the url to transport_get(), which sees it as an https URL. - we call get_repo_path() again, on the url. This second call was introduced by `f38aa83f9a` (use local cloning if insteadOf makes a local URL, 2014-07-17). The idea is that we want to pull the url fresh from the remote.c API, because it will apply any aliases. And of course now it sees that there is a local file, which is a mismatch with the transport we already selected. The issue in the above sequence is calling `transport_get()` before deciding whether or not the repository is indeed local, and not passing in an absolute path if it is local. This is reminiscent of a similar bug report in [1], where it was suggested to perform the `insteadOf` lookup earlier. Taking that approach may not be as straightforward, since the intent is to store the original URL in the config, but to actually fetch from the insteadOf one, so conflating the two early on is a non-starter. Note: we pass the path returned by `get_repo_path(remote->url[0])`, which should be the same as `repo_name` (aside from any `insteadOf` rewrites). We could pass `absolute_pathdup()` of the same argument, which `86521acaca` (Bring local clone's origin URL in line with that of a remote clone, 2008-09-01) indicates may differ depending on the presence of ".git/" for a non-bare repo. That matters for forming relative submodule paths, but doesn't matter for the second call, since we're just feeding it to the transport code, which is fine either way. [1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-24 16:52:16 -08:00
Taylor Blau	58325b93c5	t5619: demonstrate clone_local() with ambiguous transport When cloning a repository, Git must determine (a) what transport mechanism to use, and (b) whether or not the clone is local. Since `f38aa83f9a` (use local cloning if insteadOf makes a local URL, 2014-07-17), the latter check happens after the remote has been initialized, and references the remote's URL instead of the local path. This is done to make it possible for a `url.<base>.insteadOf` rule to convert a remote URL into a local one, in which case the `clone_local()` mechanism should be used. However, with a specially crafted repository, Git can be tricked into using a non-local transport while still setting `is_local` to "1" and using the `clone_local()` optimization. The below test case demonstrates such an instance, and shows that it can be used to include arbitrary (known) paths in the working copy of a cloned repository on a victim's machine[^1], even if local file clones are forbidden by `protocol.file.allow`. This happens in a few parts: 1. We first call `get_repo_path()` to see if the remote is a local path. If it is, we replace the repo name with its absolute path. 2. We then call `transport_get()` on the repo name and decide how to access it. If it was turned into an absolute path in the previous step, then we should always treat it like a file. 3. We use `get_repo_path()` again, and set `is_local` as appropriate. But it's already too late to rewrite the repo name as an absolute path, since we've already fed it to the transport code. The attack works by including a submodule whose URL corresponds to a path on disk. In the below example, the repository "sub" is reachable via the dumb HTTP protocol at (something like): http://127.0.0.1:NNNN/dumb/sub.git However, the path "http:/127.0.0.1:NNNN/dumb" (that is, a top-level directory called "http:", then nested directories "127.0.0.1:NNNN", and "dumb") exists within the repository, too. To determine this, it first picks the appropriate transport, which is dumb HTTP. It then uses the remote's URL in order to determine whether the repository exists locally on disk. However, the malicious repository also contains an embedded stub repository which is the target of a symbolic link at the local path corresponding to the "sub" repository on disk (i.e., there is a symbolic link at "http:/127.0.0.1/dumb/sub.git", pointing to the stub repository via ".git/modules/sub/../../../repo"). This stub repository fools Git into thinking that a local repository exists at that URL and thus can be cloned locally. The affected call is in `get_repo_path()`, which in turn calls `get_repo_path_1()`, which locates a valid repository at that target. This then causes Git to set the `is_local` variable to "1", and in turn instructs Git to clone the repository using its local clone optimization via the `clone_local()` function. The exploit comes into play because the stub repository's top-level "$GIT_DIR/objects" directory is a symbolic link which can point to an arbitrary path on the victim's machine. `clone_local()` resolves the top-level "objects" directory through a `stat(2)` call, meaning that we read through the symbolic link and copy or hardlink the directory contents at the destination of the link. In other words, we can get steps (1) and (3) to disagree by leveraging the dangling symlink to pick a non-local transport in the first step, and then set is_local to "1" in the third step when cloning with `--separate-git-dir`, which makes the symlink non-dangling. This can result in data-exfiltration on the victim's machine when sensitive data is at a known path (e.g., "/home/$USER/.ssh"). The appropriate fix is two-fold: - Resolve the transport later on (to avoid using the local clone optimization with a non-local transport). - Avoid reading through the top-level "objects" directory when (correctly) using the clone_local() optimization. This patch merely demonstrates the issue. The following two patches will implement each part of the above fix, respectively. [^1]: Provided that any target directory does not contain symbolic links, in which case the changes from `6f054f9fb3` (builtin/clone.c: disallow `--local` clones with symlinks, 2022-07-28) will abort the clone. Reported-by: yvvdwf <yvvdwf@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-24 16:52:16 -08:00
Junio C Hamano	f8bf6b8f3d	Sync with maint-2.30 * maint-2.30: attr: adjust a mismatched data type	2023-01-19 13:45:23 -08:00
Johannes Schindelin	0227130244	attr: adjust a mismatched data type On platforms where `size_t` does not have the same width as `unsigned long`, passing a pointer to the former when a pointer to the latter is expected can lead to problems. Windows and 32-bit Linux are among the affected platforms. In this instance, we want to store the size of the blob that was read in that variable. However, `read_blob_data_from_index()` passes that pointer to `read_object_file()` which expects an `unsigned long *`. Which means that on affected platforms, the variable is not fully populated and part of its value is left uninitialized. (On Big-Endian platforms, this problem would be even worse.) The consequence is that depending on the uninitialized memory's contents, we may erroneously reject perfectly fine attributes. Let's address this by passing a pointer to a variable of the expected data type. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-19 13:38:06 -08:00
Junio C Hamano	82689d5e5d	Git 2.31.6 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-13 21:04:03 +09:00
Junio C Hamano	16128765d7	Sync with Git 2.30.7	2022-12-13 21:02:20 +09:00
Junio C Hamano	b7b37a3371	Git 2.30.7 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-13 20:56:43 +09:00
Junio C Hamano	6662a836eb	Merge branch 'ps/attr-limits' into maint-2.30	2022-12-09 16:05:52 +09:00
Junio C Hamano	3305300f4c	Merge branch 'ps/format-padding-fix' into maint-2.30	2022-12-09 16:02:39 +09:00
Patrick Steinhardt	304a50adff	pretty: restrict input lengths for padding and wrapping formats Both the padding and wrapping formatting directives allow the caller to specify an integer that ultimately leads to us adding this many chars to the result buffer. As a consequence, it is trivial to e.g. allocate 2GB of RAM via a single formatting directive and cause resource exhaustion on the machine executing this logic. Furthermore, it is debatable whether there are any sane usecases that require the user to pad data to 2GB boundaries or to indent wrapped data by 2GB. Restrict the input sizes to 16 kilobytes at a maximum to limit the amount of bytes that can be requested by the user. This is not meant as a fix because there are ways to trivially amplify the amount of data we generate via formatting directives; the real protection is achieved by the changes in previous steps to catch and avoid integer wraparound that causes us to under-allocate and access beyond the end of allocated memory reagions. But having such a limit significantly helps fuzzing the pretty format, because the fuzzer is otherwise quite fast to run out-of-memory as it discovers these formatters. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	f930a23943	utf8: refactor `strbuf_utf8_replace` to not rely on preallocated buffer In `strbuf_utf8_replace`, we preallocate the destination buffer and then use `memcpy` to copy bytes into it at computed offsets. This feels rather fragile and is hard to understand at times. Refactor the code to instead use `strbuf_add` and `strbuf_addstr` so that we can be sure that there is no possibility to perform an out-of-bounds write. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	81c2d4c3a5	utf8: fix checking for glyph width in `strbuf_utf8_replace()` In `strbuf_utf8_replace()`, we call `utf8_width()` to compute the width of the current glyph. If the glyph is a control character though it can be that `utf8_width()` returns `-1`, but because we assign this value to a `size_t` the conversion will cause us to underflow. This bug can easily be triggered with the following command: $ git log --pretty='format:xxx%<\|(1,trunc)%x10' >From all I can see though this seems to be a benign underflow that has no security-related consequences. Fix the bug by using an `int` instead. When we see a control character, we now copy it into the target buffer but don't advance the current width of the string. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	937b71cc8b	utf8: fix overflow when returning string width The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is `int`, but we operate on string lengths which are typically of type `size_t`. This means that when the string is longer than `INT_MAX`, we will overflow and thus return a negative result. This can lead to an out-of-bounds write with `--pretty=format:%<1)%B` and a commit message that is 2^31+1 bytes long: ================================================================= ==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8 WRITE of size 2147483649 at 0x603000001168 thread T0 #0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 #1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763 #2 0x5612bbb1087a in format_commit_item pretty.c:1801 #3 0x5612bbc33bab in strbuf_expand strbuf.c:429 #4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869 #5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161 #6 0x5612bba0a4d5 in show_log log-tree.c:781 #7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117 #8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508 #9 0x5612bb69235b in cmd_log_walk builtin/log.c:549 #10 0x5612bb6951a2 in cmd_log builtin/log.c:883 #11 0x5612bb56c993 in run_builtin git.c:466 #12 0x5612bb56d397 in handle_builtin git.c:721 #13 0x5612bb56db07 in run_argv git.c:788 #14 0x5612bb56e8a7 in cmd_main git.c:923 #15 0x5612bb803682 in main common-main.c:57 #16 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f) #17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115 0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168) allocated by thread T0 here: #0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85 #1 0x5612bbcdd556 in xrealloc wrapper.c:136 #2 0x5612bbc310a3 in strbuf_grow strbuf.c:99 #3 0x5612bbc32acd in strbuf_add strbuf.c:298 #4 0x5612bbc33aec in strbuf_expand strbuf.c:418 #5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869 #6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161 #7 0x5612bba0a4d5 in show_log log-tree.c:781 #8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117 #9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508 #10 0x5612bb69235b in cmd_log_walk builtin/log.c:549 #11 0x5612bb6951a2 in cmd_log builtin/log.c:883 #12 0x5612bb56c993 in run_builtin git.c:466 #13 0x5612bb56d397 in handle_builtin git.c:721 #14 0x5612bb56db07 in run_argv git.c:788 #15 0x5612bb56e8a7 in cmd_main git.c:923 #16 0x5612bb803682 in main common-main.c:57 #17 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f) SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy Shadow bytes around the buggy address: 0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd 0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa 0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa 0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd =>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa 0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==26009==ABORTING Now the proper fix for this would be to convert both functions to return an `size_t` instead of an `int`. But given that this commit may be part of a security release, let's instead do the minimal viable fix and die in case we see an overflow. Add a test that would have previously caused us to crash. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	17d23e8a38	utf8: fix returning negative string width The `utf8_strnwidth()` function calls `utf8_width()` in a loop and adds its returned width to the end result. `utf8_width()` can return `-1` though in case it reads a control character, which means that the computed string width is going to be wrong. In the worst case where there are more control characters than non-control characters, we may even return a negative string width. Fix this bug by treating control characters as having zero width. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	522cc87fdc	utf8: fix truncated string lengths in `utf8_strnwidth()` The `utf8_strnwidth()` function accepts an optional string length as input parameter. This parameter can either be set to `-1`, in which case we call `strlen()` on the input. Or it can be set to a positive integer that indicates a precomputed length, which callers typically compute by calling `strlen()` at some point themselves. The input parameter is an `int` though, whereas `strlen()` returns a `size_t`. This can lead to implementation-defined behaviour though when the `size_t` cannot be represented by the `int`. In the general case though this leads to wrap-around and thus to negative string sizes, which is sure enough to not lead to well-defined behaviour. Fix this by accepting a `size_t` instead of an `int` as string length. While this takes away the ability of callers to simply pass in `-1` as string length, it really is trivial enough to convert them to instead pass in `strlen()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	48050c42c7	pretty: fix integer overflow in wrapping format The `%w(width,indent1,indent2)` formatting directive can be used to rewrap text to a specific width and is designed after git-shortlog(1)'s `-w` parameter. While the three parameters are all stored as `size_t` internally, `strbuf_add_wrapped_text()` accepts integers as input. As a result, the casted integers may overflow. As these now-negative integers are later on passed to `strbuf_addchars()`, we will ultimately run into implementation-defined behaviour due to casting a negative number back to `size_t` again. On my platform, this results in trying to allocate 9000 petabyte of memory. Fix this overflow by using `cast_size_t_to_int()` so that we reject inputs that cannot be represented as an integer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	1de69c0cdd	pretty: fix adding linefeed when placeholder is not expanded When a formatting directive has a `+` or ` ` after the `%`, then we add either a line feed or space if the placeholder expands to a non-empty string. In specific cases though this logic doesn't work as expected, and we try to add the character even in the case where the formatting directive is empty. One such pattern is `%w(1)%+d%+w(2)`. `%+d` expands to reference names pointing to a certain commit, like in `git log --decorate`. For a tagged commit this would for example expand to `\n (tag: v1.0.0)`, which has a leading newline due to the `+` modifier and a space added by `%d`. Now the second wrapping directive will cause us to rewrap the text to `\n(tag:\nv1.0.0)`, which is one byte shorter due to the missing leading space. The code that handles the `+` magic now notices that the length has changed and will thus try to insert a leading line feed at the original posititon. But as the string was shortened, the original position is past the buffer's boundary and thus we die with an error. Now there are two issues here: 1. We check whether the buffer length has changed, not whether it has been extended. This causes us to try and add the character past the string boundary. 2. The current logic does not make any sense whatsoever. When the string got expanded due to the rewrap, putting the separator into the original position is likely to put it somewhere into the middle of the rewrapped contents. It is debatable whether `%+w()` makes any sense in the first place. Strictly speaking, the placeholder never expands to a non-empty string, and consequentially we shouldn't ever accept this combination. We thus fix the bug by simply refusing `%+w()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	f6e0b9f389	pretty: fix out-of-bounds read when parsing invalid padding format An out-of-bounds read can be triggered when parsing an incomplete padding format string passed via `--pretty=format` or in Git archives when files are marked with the `export-subst` gitattribute. This bug exists since we have introduced support for truncating output via the `trunc` keyword `a7f01c6b4d` (pretty: support truncating in %>, %< and %><, 2013-04-19). Before this commit, we used to find the end of the formatting string by using strchr(3P). This function returns a `NULL` pointer in case the character in question wasn't found. The subsequent check whether any character was found thus simply checked the returned pointer. After the commit we switched to strcspn(3P) though, which only returns the offset to the first found character or to the trailing NUL byte. As the end pointer is now computed by adding the offset to the start pointer it won't be `NULL` anymore, and as a consequence the check doesn't do anything anymore. The out-of-bounds data that is being read can in fact end up in the formatted string. As a consequence, it is possible to leak memory contents either by calling git-log(1) or via git-archive(1) when any of the archived files is marked with the `export-subst` gitattribute. ==10888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000398 at pc 0x7f0356047cb2 bp 0x7fff3ffb95d0 sp 0x7fff3ffb8d78 READ of size 1 at 0x602000000398 thread T0 #0 0x7f0356047cb1 in __interceptor_strchrnul /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 #1 0x563b7cec9a43 in strbuf_expand strbuf.c:417 #2 0x563b7cda7060 in repo_format_commit_message pretty.c:1869 #3 0x563b7cda8d0f in pretty_print_commit pretty.c:2161 #4 0x563b7cca04c8 in show_log log-tree.c:781 #5 0x563b7cca36ba in log_tree_commit log-tree.c:1117 #6 0x563b7c927ed5 in cmd_log_walk_no_free builtin/log.c:508 #7 0x563b7c92835b in cmd_log_walk builtin/log.c:549 #8 0x563b7c92b1a2 in cmd_log builtin/log.c:883 #9 0x563b7c802993 in run_builtin git.c:466 #10 0x563b7c803397 in handle_builtin git.c:721 #11 0x563b7c803b07 in run_argv git.c:788 #12 0x563b7c8048a7 in cmd_main git.c:923 #13 0x563b7ca99682 in main common-main.c:57 #14 0x7f0355e3c28f (/usr/lib/libc.so.6+0x2328f) #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115 0x602000000398 is located 0 bytes to the right of 8-byte region [0x602000000390,0x602000000398) allocated by thread T0 here: #0 0x7f0356072faa in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x563b7cf7317c in xstrdup wrapper.c:39 #2 0x563b7cd9a06a in save_user_format pretty.c:40 #3 0x563b7cd9b3e5 in get_commit_format pretty.c:173 #4 0x563b7ce54ea0 in handle_revision_opt revision.c:2456 #5 0x563b7ce597c9 in setup_revisions revision.c:2850 #6 0x563b7c9269e0 in cmd_log_init_finish builtin/log.c:269 #7 0x563b7c927362 in cmd_log_init builtin/log.c:348 #8 0x563b7c92b193 in cmd_log builtin/log.c:882 #9 0x563b7c802993 in run_builtin git.c:466 #10 0x563b7c803397 in handle_builtin git.c:721 #11 0x563b7c803b07 in run_argv git.c:788 #12 0x563b7c8048a7 in cmd_main git.c:923 #13 0x563b7ca99682 in main common-main.c:57 #14 0x7f0355e3c28f (/usr/lib/libc.so.6+0x2328f) #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115 SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 in __interceptor_strchrnul Shadow bytes around the buggy address: 0x0c047fff8020: fa fa fd fd fa fa 00 06 fa fa 05 fa fa fa fd fd 0x0c047fff8030: fa fa 00 02 fa fa 06 fa fa fa 05 fa fa fa fd fd 0x0c047fff8040: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa 00 00 0x0c047fff8050: fa fa 00 01 fa fa fd fd fa fa 00 00 fa fa 00 01 0x0c047fff8060: fa fa 00 06 fa fa 00 06 fa fa 05 fa fa fa 05 fa =>0x0c047fff8070: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd 0x0c047fff8080: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa fd fa 0x0c047fff8090: fa fa fd fd fa fa 00 00 fa fa fa fa fa fa fa fa 0x0c047fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff80b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff80c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==10888==ABORTING Fix this bug by checking whether `end` points at the trailing NUL byte. Add a test which catches this out-of-bounds read and which demonstrates that we used to write out-of-bounds data into the formatted message. Reported-by: Markus Vervier <markus.vervier@x41-dsec.de> Original-patch-by: Markus Vervier <markus.vervier@x41-dsec.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	b49f309aa1	pretty: fix out-of-bounds read when left-flushing with stealing With the `%>>(<N>)` pretty formatter, you can ask git-log(1) et al to steal spaces. To do so we need to look ahead of the next token to see whether there are spaces there. This loop takes into account ANSI sequences that end with an `m`, and if it finds any it will skip them until it finds the first space. While doing so it does not take into account the buffer's limits though and easily does an out-of-bounds read. Add a test that hits this behaviour. While we don't have an easy way to verify this, the test causes the following failure when run with `SANITIZE=address`: ==37941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000baf at pc 0x55ba6f88e0d0 bp 0x7ffc84c50d20 sp 0x7ffc84c50d10 READ of size 1 at 0x603000000baf thread T0 #0 0x55ba6f88e0cf in format_and_pad_commit pretty.c:1712 #1 0x55ba6f88e7b4 in format_commit_item pretty.c:1801 #2 0x55ba6f9b1ae4 in strbuf_expand strbuf.c:429 #3 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869 #4 0x55ba6f890ccf in pretty_print_commit pretty.c:2161 #5 0x55ba6f7884c8 in show_log log-tree.c:781 #6 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117 #7 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508 #8 0x55ba6f41035b in cmd_log_walk builtin/log.c:549 #9 0x55ba6f4131a2 in cmd_log builtin/log.c:883 #10 0x55ba6f2ea993 in run_builtin git.c:466 #11 0x55ba6f2eb397 in handle_builtin git.c:721 #12 0x55ba6f2ebb07 in run_argv git.c:788 #13 0x55ba6f2ec8a7 in cmd_main git.c:923 #14 0x55ba6f581682 in main common-main.c:57 #15 0x7f2d08c3c28f (/usr/lib/libc.so.6+0x2328f) #16 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #17 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115 0x603000000baf is located 1 bytes to the left of 24-byte region [0x603000000bb0,0x603000000bc8) allocated by thread T0 here: #0 0x7f2d08ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85 #1 0x55ba6fa5b494 in xrealloc wrapper.c:136 #2 0x55ba6f9aefdc in strbuf_grow strbuf.c:99 #3 0x55ba6f9b0a06 in strbuf_add strbuf.c:298 #4 0x55ba6f9b1a25 in strbuf_expand strbuf.c:418 #5 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869 #6 0x55ba6f890ccf in pretty_print_commit pretty.c:2161 #7 0x55ba6f7884c8 in show_log log-tree.c:781 #8 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117 #9 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508 #10 0x55ba6f41035b in cmd_log_walk builtin/log.c:549 #11 0x55ba6f4131a2 in cmd_log builtin/log.c:883 #12 0x55ba6f2ea993 in run_builtin git.c:466 #13 0x55ba6f2eb397 in handle_builtin git.c:721 #14 0x55ba6f2ebb07 in run_argv git.c:788 #15 0x55ba6f2ec8a7 in cmd_main git.c:923 #16 0x55ba6f581682 in main common-main.c:57 #17 0x7f2d08c3c28f (/usr/lib/libc.so.6+0x2328f) #18 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #19 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115 SUMMARY: AddressSanitizer: heap-buffer-overflow pretty.c:1712 in format_and_pad_commit Shadow bytes around the buggy address: 0x0c067fff8120: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd 0x0c067fff8130: fd fd fa fa fd fd fd fd fa fa fd fd fd fa fa fa 0x0c067fff8140: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x0c067fff8150: fa fa fd fd fd fd fa fa 00 00 00 fa fa fa fd fd 0x0c067fff8160: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa =>0x0c067fff8170: fd fd fd fa fa[fa]00 00 00 fa fa fa 00 00 00 fa 0x0c067fff8180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff81a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff81b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff81c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Luckily enough, this would only cause us to copy the out-of-bounds data into the formatted commit in case we really had an ANSI sequence preceding our buffer. So this bug likely has no security consequences. Fix it regardless by not traversing past the buffer's start. Reported-by: Patrick Steinhardt <ps@pks.im> Reported-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Patrick Steinhardt	81dc898df9	pretty: fix out-of-bounds write caused by integer overflow When using a padding specifier in the pretty format passed to git-log(1) we need to calculate the string length in several places. These string lengths are stored in `int`s though, which means that these can easily overflow when the input lengths exceeds 2GB. This can ultimately lead to an out-of-bounds write when these are used in a call to memcpy(3P): ==8340==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1ec62f97fe at pc 0x7f2127e5f427 bp 0x7ffd3bd63de0 sp 0x7ffd3bd63588 WRITE of size 1 at 0x7f1ec62f97fe thread T0 #0 0x7f2127e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 #1 0x5628e96aa605 in format_and_pad_commit pretty.c:1762 #2 0x5628e96aa7f4 in format_commit_item pretty.c:1801 #3 0x5628e97cdb24 in strbuf_expand strbuf.c:429 #4 0x5628e96ab060 in repo_format_commit_message pretty.c:1869 #5 0x5628e96acd0f in pretty_print_commit pretty.c:2161 #6 0x5628e95a44c8 in show_log log-tree.c:781 #7 0x5628e95a76ba in log_tree_commit log-tree.c:1117 #8 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508 #9 0x5628e922c35b in cmd_log_walk builtin/log.c:549 #10 0x5628e922f1a2 in cmd_log builtin/log.c:883 #11 0x5628e9106993 in run_builtin git.c:466 #12 0x5628e9107397 in handle_builtin git.c:721 #13 0x5628e9107b07 in run_argv git.c:788 #14 0x5628e91088a7 in cmd_main git.c:923 #15 0x5628e939d682 in main common-main.c:57 #16 0x7f2127c3c28f (/usr/lib/libc.so.6+0x2328f) #17 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #18 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115 0x7f1ec62f97fe is located 2 bytes to the left of 4831838265-byte region [0x7f1ec62f9800,0x7f1fe62f9839) allocated by thread T0 here: #0 0x7f2127ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85 #1 0x5628e98774d4 in xrealloc wrapper.c:136 #2 0x5628e97cb01c in strbuf_grow strbuf.c:99 #3 0x5628e97ccd42 in strbuf_addchars strbuf.c:327 #4 0x5628e96aa55c in format_and_pad_commit pretty.c:1761 #5 0x5628e96aa7f4 in format_commit_item pretty.c:1801 #6 0x5628e97cdb24 in strbuf_expand strbuf.c:429 #7 0x5628e96ab060 in repo_format_commit_message pretty.c:1869 #8 0x5628e96acd0f in pretty_print_commit pretty.c:2161 #9 0x5628e95a44c8 in show_log log-tree.c:781 #10 0x5628e95a76ba in log_tree_commit log-tree.c:1117 #11 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508 #12 0x5628e922c35b in cmd_log_walk builtin/log.c:549 #13 0x5628e922f1a2 in cmd_log builtin/log.c:883 #14 0x5628e9106993 in run_builtin git.c:466 #15 0x5628e9107397 in handle_builtin git.c:721 #16 0x5628e9107b07 in run_argv git.c:788 #17 0x5628e91088a7 in cmd_main git.c:923 #18 0x5628e939d682 in main common-main.c:57 #19 0x7f2127c3c28f (/usr/lib/libc.so.6+0x2328f) #20 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #21 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115 SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy Shadow bytes around the buggy address: 0x0fe458c572a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0fe458c572b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0fe458c572c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0fe458c572d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0fe458c572e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0fe458c572f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa] 0x0fe458c57300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0fe458c57310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0fe458c57320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0fe458c57330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0fe458c57340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==8340==ABORTING The pretty format can also be used in `git archive` operations via the `export-subst` attribute. So this is what in our opinion makes this a critical issue in the context of Git forges which allow to download an archive of user supplied Git repositories. Fix this vulnerability by using `size_t` instead of `int` to track the string lengths. Add tests which detect this vulnerability when Git is compiled with the address sanitizer. Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Original-patch-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Modified-by: Taylor Blau <me@ttalorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:21 +09:00
Carlo Marcelo Arenas Belón	a244dc5b0a	test-lib: add prerequisite for 64-bit platforms Allow tests that assume a 64-bit `size_t` to be skipped in 32-bit platforms and regardless of the size of `long`. This imitates the `LONG_IS_64BIT` prerequisite. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-09 14:26:04 +09:00
Patrick Steinhardt	3c50032ff5	attr: ignore overly large gitattributes files Similar as with the preceding commit, start ignoring gitattributes files that are overly large to protect us against out-of-bounds reads and writes caused by integer overflows. Unfortunately, we cannot just define "overly large" in terms of any preexisting limits in the codebase. Instead, we choose a very conservative limit of 100MB. This is plenty of room for specifying gitattributes, and incidentally it is also the limit for blob sizes for GitHub. While we don't want GitHub to dictate limits here, it is still sensible to use this fact for an informed decision given that it is hosting a huge set of repositories. Furthermore, over at GitLab we scanned a subset of repositories for their root-level attribute files. We found that 80% of them have a gitattributes file smaller than 100kB, 99.99% have one smaller than 1MB, and only a single repository had one that was almost 3MB in size. So enforcing a limit of 100MB seems to give us ample of headroom. With this limit in place we can be reasonably sure that there is no easy way to exploit the gitattributes file via integer overflows anymore. Furthermore, it protects us against resource exhaustion caused by allocating the in-memory data structures required to represent the parsed attributes. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:50:03 +09:00
Patrick Steinhardt	dfa6b32b5e	attr: ignore attribute lines exceeding 2048 bytes There are two different code paths to read gitattributes: once via a file, and once via the index. These two paths used to behave differently because when reading attributes from a file, we used fgets(3P) with a buffer size of 2kB. Consequentially, we silently truncate line lengths when lines are longer than that and will then parse the remainder of the line as a new pattern. It goes without saying that this is entirely unexpected, but it's even worse that the behaviour depends on how the gitattributes are parsed. While this is simply wrong, the silent truncation saves us with the recently discovered vulnerabilities that can cause out-of-bound writes or reads with unreasonably long lines due to integer overflows. As the common path is to read gitattributes via the worktree file instead of via the index, we can assume that any gitattributes file that had lines longer than that is already broken anyway. So instead of lifting the limit here, we can double down on it to fix the vulnerabilities. Introduce an explicit line length limit of 2kB that is shared across all paths that read attributes and ignore any line that hits this limit while printing a warning. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:33:07 +09:00
Patrick Steinhardt	d74b1fd54f	attr: fix silently splitting up lines longer than 2048 bytes When reading attributes from a file we use fgets(3P) with a buffer size of 2048 bytes. This means that as soon as a line exceeds the buffer size we split it up into multiple parts and parse each of them as a separate pattern line. This is of course not what the user intended, and even worse the behaviour is inconsistent with how we read attributes from the index. Fix this bug by converting the code to use `strbuf_getline()` instead. This will indeed read in the whole line, which may theoretically lead to an out-of-memory situation when the gitattributes file is huge. We're about to reject any gitattributes files larger than 100MB in the next commit though, which makes this less of a concern. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:29:30 +09:00
Patrick Steinhardt	a60a66e409	attr: harden allocation against integer overflows When parsing an attributes line, we need to allocate an array that holds all attributes specified for the given file pattern. The calculation to determine the number of bytes that need to be allocated was prone to an overflow though when there was an unreasonable amount of attributes. Harden the allocation by instead using the `st_` helper functions that cause us to die when we hit an integer overflow. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	e1e12e97ac	attr: fix integer overflow with more than INT_MAX macros Attributes have a field that tracks the position in the `all_attrs` array they're stored inside. This field gets set via `hashmap_get_size` when adding the attribute to the global map of attributes. But while the field is of type `int`, the value returned by `hashmap_get_size` is an `unsigned int`. It can thus happen that the value overflows, where we would now dereference teh `all_attrs` array at an out-of-bounds value. We do have a sanity check for this overflow via an assert that verifies the index matches the new hashmap's size. But asserts are not a proper mechanism to detect against any such overflows as they may not in fact be compiled into production code. Fix this by using an `unsigned int` to track the index and convert the assert to a call `die()`. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	447ac906e1	attr: fix out-of-bounds read with unreasonable amount of patterns The `struct attr_stack` tracks the stack of all patterns together with their attributes. When parsing a gitattributes file that has more than 2^31 such patterns though we may trigger multiple out-of-bounds reads on 64 bit platforms. This is because while the `num_matches` variable is an unsigned integer, we always use a signed integer to iterate over them. I have not been able to reproduce this issue due to memory constraints on my systems. But despite the out-of-bounds reads, the worst thing that can seemingly happen is to call free(3P) with a garbage pointer when calling `attr_stack_free()`. Fix this bug by using unsigned integers to iterate over the array. While this makes the iteration somewhat awkward when iterating in reverse, it is at least better than knowingly running into an out-of-bounds read. While at it, convert the call to `ALLOC_GROW` to use `ALLOC_GROW_BY` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	34ace8bad0	attr: fix out-of-bounds write when parsing huge number of attributes It is possible to trigger an integer overflow when parsing attribute names when there are more than 2^31 of them for a single pattern. This can either lead to us dying due to trying to request too many bytes: blob=$(perl -e 'print "f" . " a=" x 2147483649' \| git hash-object -w --stdin) git update-index --add --cacheinfo 100644,$blob,.gitattributes git attr-check --all file ================================================================= ==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0) #0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x5563a0a1e3d3 in xcalloc wrapper.c:150 #2 0x5563a058d005 in parse_attr_line attr.c:384 #3 0x5563a058e661 in handle_attr_line attr.c:660 #4 0x5563a058eddb in read_attr_from_index attr.c:769 #5 0x5563a058ef12 in read_attr attr.c:797 #6 0x5563a058f24c in bootstrap_attr_stack attr.c:867 #7 0x5563a058f4a3 in prepare_attr_stack attr.c:902 #8 0x5563a05905da in collect_some_attrs attr.c:1097 #9 0x5563a059093d in git_all_attrs attr.c:1128 #10 0x5563a02f636e in check_attr builtin/check-attr.c:67 #11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183 #12 0x5563a02aa993 in run_builtin git.c:466 #13 0x5563a02ab397 in handle_builtin git.c:721 #14 0x5563a02abb2b in run_argv git.c:788 #15 0x5563a02ac991 in cmd_main git.c:926 #16 0x5563a05432bd in main common-main.c:57 #17 0x7fd3ef82228f (/usr/lib/libc.so.6+0x2328f) ==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1 SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc ==1022==ABORTING Or, much worse, it can lead to an out-of-bounds write because we underallocate and then memcpy(3P) into an array: perl -e ' print "A " . "\rh="x2000000000; print "\rh="x2000000000; print "\rh="x294967294 . "\n" ' >.gitattributes git add .gitattributes git commit -am "evil attributes" $ git clone --quiet /path/to/repo ================================================================= ==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58 WRITE of size 8 at 0x602000002550 thread T0 #0 0x5555559884d4 in parse_attr_line attr.c:393 #1 0x5555559884d4 in handle_attr_line attr.c:660 #2 0x555555988902 in read_attr_from_index attr.c:784 #3 0x555555988902 in read_attr_from_index attr.c:747 #4 0x555555988a1d in read_attr attr.c:800 #5 0x555555989b0c in bootstrap_attr_stack attr.c:882 #6 0x555555989b0c in prepare_attr_stack attr.c:917 #7 0x555555989b0c in collect_some_attrs attr.c:1112 #8 0x55555598b141 in git_check_attr attr.c:1126 #9 0x555555a13004 in convert_attrs convert.c:1311 #10 0x555555a95e04 in checkout_entry_ca entry.c:553 #11 0x555555d58bf6 in checkout_entry entry.h:42 #12 0x555555d58bf6 in check_updates unpack-trees.c:480 #13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040 #14 0x555555785ab7 in checkout builtin/clone.c:724 #15 0x555555785ab7 in cmd_clone builtin/clone.c:1384 #16 0x55555572443c in run_builtin git.c:466 #17 0x55555572443c in handle_builtin git.c:721 #18 0x555555727872 in run_argv git.c:788 #19 0x555555727872 in cmd_main git.c:926 #20 0x555555721fa0 in main common-main.c:57 #21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308 #22 0x555555723f39 in _start (git+0x1cff39) 0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here: #0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x555555d7fff7 in xcalloc wrapper.c:150 #2 0x55555598815f in parse_attr_line attr.c:384 #3 0x55555598815f in handle_attr_line attr.c:660 #4 0x555555988902 in read_attr_from_index attr.c:784 #5 0x555555988902 in read_attr_from_index attr.c:747 #6 0x555555988a1d in read_attr attr.c:800 #7 0x555555989b0c in bootstrap_attr_stack attr.c:882 #8 0x555555989b0c in prepare_attr_stack attr.c:917 #9 0x555555989b0c in collect_some_attrs attr.c:1112 #10 0x55555598b141 in git_check_attr attr.c:1126 #11 0x555555a13004 in convert_attrs convert.c:1311 #12 0x555555a95e04 in checkout_entry_ca entry.c:553 #13 0x555555d58bf6 in checkout_entry entry.h:42 #14 0x555555d58bf6 in check_updates unpack-trees.c:480 #15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040 #16 0x555555785ab7 in checkout builtin/clone.c:724 #17 0x555555785ab7 in cmd_clone builtin/clone.c:1384 #18 0x55555572443c in run_builtin git.c:466 #19 0x55555572443c in handle_builtin git.c:721 #20 0x555555727872 in run_argv git.c:788 #21 0x555555727872 in cmd_main git.c:926 #22 0x555555721fa0 in main common-main.c:57 #23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line Shadow bytes around the buggy address: 0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00 0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa 0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa 0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02 0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03 =>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa 0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==15062==ABORTING Fix this bug by using `size_t` instead to count the number of attributes so that this value cannot reasonably overflow without running out of memory before already. Reported-by: Markus Vervier <markus.vervier@x41-dsec.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	2455720950	attr: fix integer overflow when parsing huge attribute names It is possible to trigger an integer overflow when parsing attribute names that are longer than 2^31 bytes because we assign the result of strlen(3P) to an `int` instead of to a `size_t`. This can lead to an abort in vsnprintf(3P) with the following reproducer: blob=$(perl -e 'print "A " . "B"x2147483648 . "\n"' \| git hash-object -w --stdin) git update-index --add --cacheinfo 100644,$blob,.gitattributes git check-attr --all path BUG: strbuf.c:400: your vsnprintf is broken (returned -1) But furthermore, assuming that the attribute name is even longer than that, it can cause us to silently truncate the attribute and thus lead to wrong results. Fix this integer overflow by using a `size_t` instead. This fixes the silent truncation of attribute names, but it only partially fixes the BUG we hit: even though the initial BUG is fixed, we can still hit a BUG when parsing invalid attribute lines via `report_invalid_attr()`. This is due to an underlying design issue in vsnprintf(3P) which only knows to return an `int`, and thus it may always overflow with large inputs. This issue is benign though: the worst that can happen is that the error message is misreported to be either truncated or too long, but due to the buffer being NUL terminated we wouldn't ever do an out-of-bounds read here. Reported-by: Markus Vervier <markus.vervier@x41-dsec.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	8d0d48cf21	attr: fix out-of-bounds read with huge attribute names There is an out-of-bounds read possible when parsing gitattributes that have an attribute that is 2^31+1 bytes long. This is caused due to an integer overflow when we assign the result of strlen(3P) to an `int`, where we use the wrapped-around value in a subsequent call to memcpy(3P). The following code reproduces the issue: blob=$(perl -e 'print "a" x 2147483649 . " attr"' \| git hash-object -w --stdin) git update-index --add --cacheinfo 100644,$blob,.gitattributes git check-attr --all file AddressSanitizer:DEADLYSIGNAL ================================================================= ==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0) ==8451==The signal is caused by a READ memory access. #0 0x7f94f1f8f082 (/usr/lib/libc.so.6+0x176082) #1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752 #2 0x560e190f7f26 in parse_attr_line attr.c:375 #3 0x560e190f9663 in handle_attr_line attr.c:660 #4 0x560e190f9ddd in read_attr_from_index attr.c:769 #5 0x560e190f9f14 in read_attr attr.c:797 #6 0x560e190fa24e in bootstrap_attr_stack attr.c:867 #7 0x560e190fa4a5 in prepare_attr_stack attr.c:902 #8 0x560e190fb5dc in collect_some_attrs attr.c:1097 #9 0x560e190fb93f in git_all_attrs attr.c:1128 #10 0x560e18e6136e in check_attr builtin/check-attr.c:67 #11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183 #12 0x560e18e15993 in run_builtin git.c:466 #13 0x560e18e16397 in handle_builtin git.c:721 #14 0x560e18e16b2b in run_argv git.c:788 #15 0x560e18e17991 in cmd_main git.c:926 #16 0x560e190ae2bd in main common-main.c:57 #17 0x7f94f1e3c28f (/usr/lib/libc.so.6+0x2328f) #18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) #19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082) ==8451==ABORTING Fix this bug by converting the variable to a `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Patrick Steinhardt	eb22e7dfa2	attr: fix overflow when upserting attribute with overly long name The function `git_attr_internal()` is called to upsert attributes into the global map. And while all callers pass a `size_t`, the function itself accepts an `int` as the attribute name's length. This can lead to an integer overflow in case the attribute name is longer than `INT_MAX`. Now this overflow seems harmless as the first thing we do is to call `attr_name_valid()`, and that function only succeeds in case all chars in the range of `namelen` match a certain small set of chars. We thus can't do an out-of-bounds read as NUL is not part of that set and all strings passed to this function are NUL-terminated. And furthermore, we wouldn't ever read past the current attribute name anyway due to the same reason. And if validation fails we will return early. On the other hand it feels fragile to rely on this behaviour, even more so given that we pass `namelen` to `FLEX_ALLOC_MEM()`. So let's instead just do the correct thing here and accept a `size_t` as line length. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-05 15:14:16 +09:00
Taylor Blau	ecf9b4a443	Git 2.31.5 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:39:26 -04:00
Taylor Blau	122512967e	Sync with 2.30.6 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:39:15 -04:00
Johannes Schindelin	5b1c746c35	Git 2.31.4 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-06-23 12:35:25 +02:00
Johannes Schindelin	2f8809f9a1	Sync with 2.30.5 * maint-2.30: Git 2.30.5 setup: tighten ownership checks post CVE-2022-24765 git-compat-util: allow root to access both SUDO_UID and root owned t0034: add negative tests and allow git init to mostly work under sudo git-compat-util: avoid failing dir ownership checks if running privileged t: regression git needs safe.directory when using sudo	2022-06-23 12:35:23 +02:00
Junio C Hamano	09f66d65f8	Git 2.31.3 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 15:21:08 -07:00
Johannes Schindelin	44de39c45c	Git 2.31.2 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-03-24 00:24:29 +01:00
Johannes Schindelin	6a2381a3e5	Sync with 2.30.3 * maint-2.30: Git 2.30.3 setup_git_directory(): add an owner check for the top-level directory Add a function to determine whether a path is owned by the current user	2022-03-24 00:24:29 +01:00
Junio C Hamano	48bf2fa8ba	Git 2.31.1 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-26 14:49:41 -07:00
Junio C Hamano	ef486a9ecf	Merge branch 'tb/git-mv-icase-fix' Fix a corner case bug in "git mv" on case insensitive systems, which was introduced in 2.29 timeframe. * tb/git-mv-icase-fix: git mv foo FOO ; git mv foo bar gave an assert	2021-03-19 15:25:40 -07:00
Junio C Hamano	bfcc6e2a68	Merge branch 'rs/xcalloc-takes-nelem-first' Code cleanup. * rs/xcalloc-takes-nelem-first: fix xcalloc() argument order	2021-03-19 15:25:39 -07:00
Junio C Hamano	af107029b1	Merge branch 'ah/make-fuzz-all-doc-update' Update insn in Makefile comments to run fuzz-all target. * ah/make-fuzz-all-doc-update: Makefile: update 'make fuzz-all' docs to reflect modern clang	2021-03-19 15:25:39 -07:00
Junio C Hamano	c691e918f4	Merge branch 'jk/slimmed-down' Unused code removal. * jk/slimmed-down: vcs-svn: remove header files as well	2021-03-19 15:25:38 -07:00
Junio C Hamano	92ccd7b752	Merge branch 'rs/calloc-array' CALLOC_ARRAY() macro replaces many uses of xcalloc(). * rs/calloc-array: cocci: allow xcalloc(1, size) use CALLOC_ARRAY git-compat-util.h: drop trailing semicolon from macro definition	2021-03-19 15:25:38 -07:00
Junio C Hamano	a8a0ac3234	Merge branch 'rs/avoid-null-statement-after-macro-call' Fix macros that can silently inject unintended null-statements. * rs/avoid-null-statement-after-macro-call: mem-pool: drop trailing semicolon from macro definition block-sha1: drop trailing semicolon from macro definition	2021-03-19 15:25:38 -07:00
Junio C Hamano	948e8ac534	Merge branch 'km/config-doc-typofix' Docfix. * km/config-doc-typofix: config.txt: add missing period	2021-03-19 15:25:38 -07:00
Junio C Hamano	cc930b7472	Merge branch 'jt/clone-unborn-head' Test fix. * jt/clone-unborn-head: t5606: run clone branch name test with protocol v2	2021-03-19 15:25:38 -07:00
Junio C Hamano	1dd4e74522	Merge branch 'js/fsmonitor-unpack-fix' The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases	2021-03-19 15:25:37 -07:00
Junio C Hamano	35381b13da	Merge branch 'jk/bisect-peel-tag-fix' "git bisect" reimplemented more in C during 2.30 timeframe did not take an annotated tag as a good/bad endpoint well. This regression has been corrected. * jk/bisect-peel-tag-fix: bisect: peel annotated tags to commits	2021-03-19 15:25:37 -07:00
Junio C Hamano	8779c141da	Merge branch 'jh/fsmonitor-prework' The fsmonitor interface read from its input without making sure there is something to read from. This bug is new in 2.31 timeframe. * jh/fsmonitor-prework: fsmonitor: avoid global-buffer-overflow READ when checking trivial response	2021-03-19 15:25:37 -07:00
Junio C Hamano	eabacfd9cb	Merge branch 'jc/calloc-fix' Code clean-up. * jc/calloc-fix: xcalloc: use CALLOC_ARRAY() when applicable	2021-03-19 15:25:37 -07:00
Johannes Schindelin	4abc57848d	fsmonitor: do not forget to release the token in `discard_index()` In `56c6910028` (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust `discard_index()` to release the "last-update" token: it is no longer a 64-bit number, but a free-form string that has been allocated. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 12:19:28 -07:00
Johannes Schindelin	3dfd30598b	fsmonitor: fix memory corruption in some corner cases In `56c6910028` (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust the part of `unpack_trees()` that copies the FSMonitor "last-update" information that we copy from the source index to the result index since `679f2f9fdd` (unpack-trees: skip stat on fsmonitor-valid files, 2019-11-20). Since the "last-update" information is no longer a 64-bit number, but a free-form string that has been allocated, we need to duplicate it rather than just copying it. This is important because there _are_ cases when `unpack_trees()` will perform a oneway merge that implicitly calls `refresh_fsmonitor()` (which will allocate that "last-update" token). This happens _after_ that token was copied into the result index. However, we _then_ call `check_updates()` on that index, which will _also_ call `refresh_fsmonitor()`, accessing the "last-update" string, which by now would be released already. In the instance that lead to this patch, this caused a segmentation fault during a lengthy, complicated rebase involving the todo command `reset` that (crucially) had to updated many files. Unfortunately, it seems very hard to trigger that crash, therefore this patch is not accompanied by a regression test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 12:19:26 -07:00
Kyle Meyer	cfd409ed09	config.txt: add missing period Signed-off-by: Kyle Meyer <kyle@kyleam.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 11:25:15 -07:00
Jeff King	7730f85594	bisect: peel annotated tags to commits This patch fixes a bug where git-bisect doesn't handle receiving annotated tags as "git bisect good <tag>", etc. It's a regression in `27257bc466` (bisect--helper: reimplement `bisect_state` & `bisect_head` shell functions in C, 2020-10-15). The original shell code called: sha=$(git rev-parse --verify "$rev^{commit}") \|\| die "$(eval_gettext "Bad rev input: \$rev")" which will peel the input to a commit (or complain if that's not possible). But the C code just calls get_oid(), which will yield the oid of the tag. The fix is to peel to a commit. The error message here is a little non-idiomatic for Git (since it starts with a capital). I've mostly left it, as it matches the other converted messages (like the "Bad rev input" we print when get_oid() fails), though I did add an indication that it was the peeling that was the problem. It might be worth taking a pass through this converted code to modernize some of the error messages. Note also that the test does a bare "grep" (not i18ngrep) on the expected "X is the first bad commit" output message. This matches the rest of the test script. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 11:24:08 -07:00
Jonathan Tan	5f70859c15	t5606: run clone branch name test with protocol v2 `4f37d45706` ("clone: respect remote unborn HEAD", 2021-02-05) introduces a new feature (if the remote has an unborn HEAD, e.g. when the remote repository is empty, use it as the name of the branch) that only works in protocol v2, but did not ensure that one of its tests always uses protocol v2, and thus that test would fail if GIT_TEST_PROTOCOL_VERSION=0 (or 1) is used. Therefore, add "-c protocol.version=2" to the appropriate test. (The rest of the tests from that commit have "-c protocol.version=2" already added.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 11:19:36 -07:00
René Scharfe	116affac3f	mem-pool: drop trailing semicolon from macro definition Allow BLOCK_GROWTH_SIZE to be used like an integer literal by removing the trailing semicolon from its definition. Also wrap the expression in parentheses, to allow it to be used with operators without leading to unexpected results. It doesn't matter for the current use site, but make it follow standard macro rules anyway to avoid future surprises. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 10:20:16 -07:00
René Scharfe	3d8cbbf2c3	block-sha1: drop trailing semicolon from macro definition `23119ffb4e` (block-sha1: put expanded macro parameters in parentheses, 2012-07-22) added a trailing semicolon to the definition of SHA_MIX without explanation. It doesn't matter with the current code, but make sure to avoid potential surprises by removing it again. This allows the macro to be used almost like a function: Users can combine it with operators of their choice, but still must not pass an expression with side-effects as a parameter, as it would be evaluated multiple times. Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 10:20:01 -07:00
Andrzej Hunt	097ea2c848	fsmonitor: avoid global-buffer-overflow READ when checking trivial response query_result can be be an empty strbuf (STRBUF_INIT) - in that case trying to read 3 bytes triggers a buffer overflow read (as query_result.buf = '\0'). Therefore we need to check query_result's length before trying to read 3 bytes. This overflow was introduced in: `940b94f35c` (fsmonitor: log invocation of FSMonitor hook to trace2, 2021-02-03) It was found when running the test-suite against ASAN, and can be most easily reproduced with the following command: make GIT_TEST_OPTS="-v" DEFAULT_TEST_TARGET="t7519-status-fsmonitor.sh" \ SANITIZE=address DEVELOPER=1 test ==2235==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000019e6e5e at pc 0x00000043745c bp 0x7fffd382c520 sp 0x7fffd382bcc8 READ of size 3 at 0x0000019e6e5e thread T0 #0 0x43745b in MemcmpInterceptorCommon(void, int ()(void const, void const, unsigned long), void const, void const, unsigned long) /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7 #1 0x43786d in bcmp /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:887:10 #2 0x80b146 in fsmonitor_is_trivial_response /home/ahunt/oss-fuzz/git/fsmonitor.c:192:10 #3 0x80b146 in query_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:175:7 #4 0x80a749 in refresh_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:267:21 #5 0x80bad1 in tweak_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:429:4 #6 0x90f040 in read_index_from /home/ahunt/oss-fuzz/git/read-cache.c:2321:3 #7 0x8e5d08 in repo_read_index_preload /home/ahunt/oss-fuzz/git/preload-index.c:164:15 #8 0x52dd45 in prepare_index /home/ahunt/oss-fuzz/git/builtin/commit.c:363:6 #9 0x52a188 in cmd_commit /home/ahunt/oss-fuzz/git/builtin/commit.c:1588:15 #10 0x4ce77e in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11 #11 0x4ccb18 in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3 #12 0x4cb01c in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4 #13 0x4cb01c in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19 #14 0x6aca8d in main /home/ahunt/oss-fuzz/git/common-main.c:52:11 #15 0x7fb027bf5349 in __libc_start_main (/lib64/libc.so.6+0x24349) #16 0x4206b9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120 0x0000019e6e5e is located 2 bytes to the left of global variable 'strbuf_slopbuf' defined in 'strbuf.c:51:6' (0x19e6e60) of size 1 'strbuf_slopbuf' is ascii string '' 0x0000019e6e5e is located 126 bytes to the right of global variable 'signals' defined in 'sigchain.c:11:31' (0x19e6be0) of size 512 SUMMARY: AddressSanitizer: global-buffer-overflow /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7 in MemcmpInterceptorCommon(void, int ()(void const, void const, unsigned long), void const, void const, unsigned long) Shadow bytes around the buggy address: 0x000080334d70: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 0x000080334d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080334d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080334da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080334db0: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 =>0x000080334dc0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9[f9]01 f9 f9 f9 0x000080334dd0: f9 f9 f9 f9 03 f9 f9 f9 f9 f9 f9 f9 02 f9 f9 f9 0x000080334de0: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x000080334df0: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 0x000080334e00: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 01 f9 f9 f9 0x000080334e10: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 10:00:20 -07:00
Junio C Hamano	1c57cc70ec	cocci: allow xcalloc(1, size) Allocating a pre-cleared single element is quite common and it is misleading to use CALLOC_ARRAY(); these allocations that would be affected without this change are not allocating an array. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-15 17:56:07 -07:00
Junio C Hamano	486f4bd183	xcalloc: use CALLOC_ARRAY() when applicable These are for codebase before Git 2.31 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-15 17:51:10 -07:00
Junio C Hamano	a5828ae6b5	Git 2.31 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-15 11:51:51 -07:00
Junio C Hamano	8775279891	Merge branch 'jn/mergetool-hideresolved-is-optional' Disable the recent mergetool's hideresolved feature by default for backward compatibility and safety. * jn/mergetool-hideresolved-is-optional: doc: describe mergetool configuration in git-mergetool(1) mergetool: do not enable hideResolved by default	2021-03-14 16:01:41 -07:00
Junio C Hamano	074d162eff	Merge branch 'tb/pack-revindex-on-disk' Fix for a topic in 'master'. * tb/pack-revindex-on-disk: pack-revindex.c: don't close unopened file descriptors	2021-03-14 16:01:41 -07:00
Junio C Hamano	5be1c70518	Merge tag 'l10n-2.31.0-rnd2' of git://github.com/git-l10n/git-po l10n for Git 2.31.0 round 2 * tag 'l10n-2.31.0-rnd2' of git://github.com/git-l10n/git-po: l10n: zh_CN: for git v2.31.0 l10n round 1 and 2 l10n: de.po: Update German translation for Git v2.31.0 l10n: pt_PT: add Portuguese translations part 1 l10n: vi.po(5104t): for git v2.31.0 l10n round 2 l10n: es: 2.31.0 round 2 l10n: Add translation team info l10n: start Indonesian translation l10n: zh_TW.po: v2.31.0 round 2 (15 untranslated) l10n: bg.po: Updated Bulgarian translation (5104t) l10n: fr: v2.31 rnd 2 l10n: tr: v2.31.0-rc1 l10n: sv.po: Update Swedish translation (5104t0f0u) l10n: git.pot: v2.31.0 round 2 (9 new, 8 removed) l10n: tr: v2.31.0-rc0 l10n: sv.po: Update Swedish translation (5103t0f0u) l10n: pl.po: Update translation l10n: fr: v2.31.0 rnd 1 l10n: git.pot: v2.31.0 round 1 (155 new, 89 removed) l10n: Update Catalan translation l10n: ru.po: update Russian translation	2021-03-14 15:50:36 -07:00
René Scharfe	8588aa8657	vcs-svn: remove header files as well `fc47391e24` (drop vcs-svn experiment, 2020-08-13) removed most vcs-svn files. Drop the remaining header files as well, as they are no longer used. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-14 15:48:23 -07:00
Jiang Xin	473eb54151	l10n: zh_CN: for git v2.31.0 l10n round 1 and 2 Translate 161 new messages (5104t0f0u) for git 2.31.0. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2021-03-15 00:05:25 +08:00
Jiang Xin	4bc948a743	Merge branch 'master' of github.com:vnwildman/git * 'master' of github.com:vnwildman/git: l10n: vi.po(5104t): for git v2.31.0 l10n round 2	2021-03-15 00:04:47 +08:00
Jiang Xin	e196890735	Merge branch 'l10n/zh_TW/210301' of github.com:l10n-tw/git-po * 'l10n/zh_TW/210301' of github.com:l10n-tw/git-po: l10n: zh_TW.po: v2.31.0 round 2 (15 untranslated)	2021-03-14 22:35:44 +08:00
Jiang Xin	84bc81478e	Merge branch 'po-id' of github.com:bagasme/git-po * 'po-id' of github.com:bagasme/git-po: l10n: Add translation team info l10n: start Indonesian translation	2021-03-14 22:35:17 +08:00
Jiang Xin	bd5fba827b	Merge branch 'master' of github.com:Softcatala/git-po * 'master' of github.com:Softcatala/git-po: l10n: Update Catalan translation	2021-03-14 22:34:46 +08:00
Jiang Xin	2d897529b2	Merge branch 'russian-l10n' of github.com:DJm00n/git-po-ru * 'russian-l10n' of github.com:DJm00n/git-po-ru: l10n: ru.po: update Russian translation	2021-03-14 22:34:12 +08:00
Jiang Xin	799df2e406	Merge branch 'pt-PT' of github.com:git-l10n-pt-PT/git-po * 'pt-PT' of github.com:git-l10n-pt-PT/git-po: l10n: pt_PT: add Portuguese translations part 1	2021-03-14 22:33:26 +08:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
René Scharfe	f1121499e6	git-compat-util.h: drop trailing semicolon from macro definition Make CALLOC_ARRAY usable like a function by requiring callers to supply the trailing semicolon, which all of the current ones already do. With the extra semicolon e.g. the following code wouldn't compile because it disconnects the "else" from the "if": if (condition) CALLOC_ARRAY(ptr, n); else whatever(); Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 15:56:13 -08:00
Jonathan Nieder	53204061ac	doc: describe mergetool configuration in git-mergetool(1) In particular, this describes mergetool.hideResolved, which can help users discover this setting (either because it may be useful to them or in order to understand mergetool's behavior if they have forgotten setting it in the past). Tested by running make -C Documentation git-mergetool.1 man Documentation/git-mergetool.1 and reading through the page. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 15:34:32 -08:00
Jonathan Nieder	b2a51c1b03	mergetool: do not enable hideResolved by default When `98ea309b3f` (mergetool: add hideResolved configuration, 2021-02-09) introduced the mergetool.hideResolved setting to reduce the clutter in viewing non-conflicted sections of files in a mergetool, it enabled it by default, explaining: No adverse effects were noted in a small survey of popular mergetools[1] so this behavior defaults to `true`. In practice, alas, adverse effects do appear. A few issues: 1. No indication is shown in the UI that the base, local, and remote versions shown have been modified by additional resolution. This is inherent in the design: the idea of mergetool.hideResolved is to convince a mergetool that expects pristine local, base, and remote files to show partially resolved verisons of those files instead; there is no additional source of information accessible to the mergetool to see where the resolution has happened. (By contrast, a mergetool generating the partial resolution from conflict markers for itself would be able to hilight the resolved sections with a different color.) A user accustomed to seeing the files without partial resolution gets no indication that this behavior has changed when they upgrade Git. 2. If the computed merge did not line up the files correctly (for example due to repeated sections in the file), the partially resolved files can be misleading and do not have enough information to reconstruct what happened and compute the correct merge result. 3. Resolving a conflict can involve information beyond the textual conflict. For example, if the local and remote versions added overlapping functionality in different ways, seeing the full unresolved versions of each alongside the base gives information about each side's intent that makes it possible to come up with a resolution that combines those two intents. By contrast, when starting with partially resolved versions of those files, one can produce a subtly wrong resolution that includes redundant extra code added by one side that is not needed in the approach taken on the other. All that said, a user wanting to focus on textual conflicts with reduced clutter can still benefit from mergetool.hideResolved=true as a way to deemphasize sections of the code that resolve cleanly without requiring any changes to the invoked mergetool. The caveats described above are reduced when the user has explicitly turned this on, because then the user is aware of them. Flip the default to 'false'. Reported-by: Dana Dahlstrom <dahlstrom@google.com> Helped-by: Seth House <seth@eseth.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 15:30:29 -08:00
Junio C Hamano	13d7ab6b5d	Git 2.31-rc2	2021-03-08 16:09:43 -08:00
Junio C Hamano	56a57652ef	Sync with Git 2.30.2 for CVE-2021-21300 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-08 16:09:07 -08:00
Junio C Hamano	6c46f864e5	Merge branch 'jt/transfer-fsck-across-packs-fix' The code to fsck objects received across multiple packs during a single git fetch session has been broken when the packfile URI feature was in use. A workaround has been added by disabling the codepath to avoid keeping a packfile that is too small. * jt/transfer-fsck-across-packs-fix: fetch-pack: do not mix --pack_header and packfile uri	2021-03-08 16:04:47 -08:00
Matthias Rüster	834845142d	l10n: de.po: Update German translation for Git v2.31.0 Reviewed-by: Ralf Thielow <ralf.thielow@gmail.com> Reviewed-by: Phillip Szelat <phillip.szelat@gmail.com> Signed-off-by: Matthias Rüster <matthias.ruester@gmail.com>	2021-03-08 19:49:33 +01:00
Andrzej Hunt	68b5c3aa48	Makefile: update 'make fuzz-all' docs to reflect modern clang Clang no longer produces a libFuzzer.a. Instead, you can include libFuzzer by using -fsanitize=fuzzer. Therefore we should use that in the example command for building fuzzers. We also add -fsanitize=fuzzer-no-link to the CFLAGS to ensure that all the required instrumentation is added when compiling git [1], and remove -fsanitize-coverage=trace-pc-guard as it is deprecated. I happen to have tested with LLVM 11 - however -fsanitize=fuzzer appears to work in a wide range of reasonably modern clangs. (On my system: what used to be libFuzzer.a now lives under the following path, which is tricky albeit not impossible for a novice such as myself to find: /usr/lib64/clang/11.0.0/lib/linux/libclang_rt.fuzzer-x86_64.a ) [1] https://releases.llvm.org/11.0.0/docs/LibFuzzer.html#fuzzer-usage Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-08 10:26:25 -08:00
René Scharfe	241b5d3ebe	fix xcalloc() argument order Pass the number of elements first and ther size second, as expected by xcalloc(). Provide a semantic patch, which was actually used to generate the rest of this patch. The semantic patch would generate flip-flop diffs if both arguments are sizeofs. We don't have such a case, and it's hard to imagine the usefulness of such an allocation. If it ever occurs then we could deal with it by duplicating the rule in the semantic patch to make it cancel itself out, or we could change the code to use CALLOC_ARRAY. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-08 09:45:04 -08:00
Daniel Santos	408985d301	l10n: pt_PT: add Portuguese translations part 1 * Newlines corrected. * Add concept translation table. * Translated some. * Corrected some. * Corrected some 'Negation of Emptiness'. Signed-off-by: Daniel Santos <hello@brighterdan.com>	2021-03-08 15:21:51 +00:00
Tran Ngoc Quan	1369935987	l10n: vi.po(5104t): for git v2.31.0 l10n round 2 Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>	2021-03-08 09:03:04 +07:00
Christopher Diaz Riveros	b0adcc311b	l10n: es: 2.31.0 round 2 Signed-off-by: Christopher Diaz Riveros <christopher.diaz.riv@gmail.com>	2021-03-07 18:31:14 -05:00
Bagas Sanjaya	c21ad4d941	l10n: Add translation team info Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>	2021-03-07 19:38:29 +07:00
Bagas Sanjaya	8c4abfb8be	l10n: start Indonesian translation * Initialize PO file * Translate init-db.c * Translate wt-status.c * Translate builtin/clone.c * Translate builtin/checkout.c * Translate builtin/fetch.c * Complete core translations: * builtin/remote.c * builtin/index-pack.c * push.c * reset.c * Sync with l10n upstream Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>	2021-03-07 19:38:07 +07:00
Jonathan Tan	2aec3bc4b6	fetch-pack: do not mix --pack_header and packfile uri When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in `b664e9ffa1` ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-05 15:04:09 -08:00
Yi-Jyun Pan	8278f87022	l10n: zh_TW.po: v2.31.0 round 2 (15 untranslated) Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>	2021-03-06 02:43:34 +08:00
Alexander Shopov	2f176de687	l10n: bg.po: Updated Bulgarian translation (5104t) Signed-off-by: Alexander Shopov <ash@kambanaria.org>	2021-03-05 12:12:34 +01:00
Jiang Xin	1ecef023a9	Merge branch 'fr_next' of github.com:jnavila/git * 'fr_next' of github.com:jnavila/git: l10n: fr: v2.31 rnd 2	2021-03-05 13:47:07 +08:00
Jiang Xin	5b888ad949	Merge branch 'master' of github.com:nafmo/git-l10n-sv * 'master' of github.com:nafmo/git-l10n-sv: l10n: sv.po: Update Swedish translation (5104t0f0u)	2021-03-05 13:46:25 +08:00
Junio C Hamano	be7935ed8b	Merged the open-eintr workaround for macOS Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-04 15:42:50 -08:00
Elijah Newren	58d581c344	Documentation/RelNotes: improve release note for rename detection work There were some early changes in the 2.31 cycle to optimize some setup in diffcore-rename.c[1], some later changes to measure performance[2], and finally some significant changes to improve rename detection performance. The final one was merged with the note Performance optimization work on the rename detection continues. That works for the commit log, but feels misleading as a release note since all the changes were within one cycle. Simplify this to just Performance improvements for rename detection. The former wording could be seen as hinting that more performance improvements will come in 2.32, which is true, but we can just cover those in the 2.32 release notes when the time comes. [1] `a5ac31b5b1` (Merge branch 'en/diffcore-rename', 2021-01-25) [2] `d3a035b055` (Merge branch 'en/merge-ort-perf', 2021-02-11) [3] `12bd17521c` (Merge branch 'en/diffcore-rename', 2021-03-01) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-04 15:38:11 -08:00
Junio C Hamano	921846fa22	Merge branch 'jk/open-returns-eintr' Work around platforms whose open() is reported to return EINTR (it shouldn't, as we do our signals with SA_RESTART). * jk/open-returns-eintr: config.mak.uname: enable OPEN_RETURNS_EINTR for macOS Big Sur Makefile: add OPEN_RETURNS_EINTR knob	2021-03-04 15:34:45 -08:00
Jean-Noël Avila	068cb92300	l10n: fr: v2.31 rnd 2 Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>	2021-03-04 21:53:45 +01:00
Junio C Hamano	85c787f1e9	Merge https://github.com/prati0100/git-gui * https://github.com/prati0100/git-gui: Revert "git-gui: remove lines starting with the comment character"	2021-03-04 12:38:50 -08:00
Emir Sarı	f6a7e896b8	l10n: tr: v2.31.0-rc1 Signed-off-by: Emir Sarı <bitigchi@me.com>	2021-03-04 22:29:24 +03:00
Peter Krefting	929dc48e96	l10n: sv.po: Update Swedish translation (5104t0f0u) Signed-off-by: Peter Krefting <peter@softwolves.pp.se>	2021-03-04 19:10:43 +01:00
Jiang Xin	9b7e82b940	l10n: git.pot: v2.31.0 round 2 (9 new, 8 removed) Generate po/git.pot from v2.31.0-rc1 for git v2.31.0 l10n round 2. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2021-03-04 22:41:21 +08:00
Jiang Xin	4dd8469336	Merge branch 'master' of github.com:git/git * 'master' of github.com:git/git: (63 commits) Git 2.31-rc1 Hopefully the last batch before -rc1 Revert "commit-graph: when incompatible with graphs, indicate why" read-cache: make the index write buffer size 128K dir: fix malloc of root untracked_cache_dir commit-graph.c: display correct number of chunks when writing doc/reftable: document how to handle windows fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods ...	2021-03-04 22:40:13 +08:00
Pratyush Yadav	df4f9e28f6	Merge branch 'py/revert-commit-comments' This commit causes breakage on macOS, or in fact any platform using older versions of Tcl. Revert it. * py/revert-commit-comments: Revert "git-gui: remove lines starting with the comment character"	2021-03-04 13:59:45 +05:30
Pratyush Yadav	c0698df057	Revert "git-gui: remove lines starting with the comment character" This reverts commit `b9a43869c9`. This commit causes breakage on macOS (10.13). It causes errors on startup and completely breaks the commit functionality. There are two main problems. First, it uses `string cat` which is not supported on older Tcl versions. Second, it does a half close of the bidirectional pipe to git-stripspace which is also not supported on older Tcl versions. Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>	2021-03-04 13:53:27 +05:30
Torsten Bögershausen	93c3d297b5	git mv foo FOO ; git mv foo bar gave an assert The following sequence, on a case-insensitive file system, (strictly speeking with core.ignorecase=true) leads to an assertion failure and leaves .git/index.lock behind. git init echo foo >foo git add foo git mv foo FOO git mv foo bar This regression was introduced in Commit `9b906af657`, "git-mv: improve error message for conflicted file" The bugfix is to change the "file exist case-insensitive in the index" into the correct "file exist (case-sensitive) in the index". This avoids the "assert" later in the code and keeps setting up the "ce" pointer for ce_stage(ce) done in the next else if. This fixes https://github.com/git-for-windows/git/issues/2920 Reported-By: Dan Moseley <Dan.Moseley@microsoft.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-03 17:07:12 -08:00
Junio C Hamano	f01623b2c9	Git 2.31-rc1 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-02 22:41:13 -08:00
Emir Sarı	3ed77c4792	l10n: tr: v2.31.0-rc0 Signed-off-by: Emir Sarı <bitigchi@me.com>	2021-03-02 20:14:46 +08:00
Junio C Hamano	ec125d1bc1	Hopefully the last batch before -rc1 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-01 14:02:58 -08:00
Junio C Hamano	9889cff6d6	Merge branch 'jh/untracked-cache-fix' An under-allocation for the untracked cache data has been corrected. * jh/untracked-cache-fix: dir: fix malloc of root untracked_cache_dir	2021-03-01 14:02:58 -08:00
Junio C Hamano	ada7c5fae5	Merge branch 'ns/raise-write-index-buffer-size' Raise the buffer size used when writing the index file out from (obviously too small) 8kB to (clearly sufficiently large) 128kB. * ns/raise-write-index-buffer-size: read-cache: make the index write buffer size 128K	2021-03-01 14:02:58 -08:00
Junio C Hamano	28714238c8	Merge branch 'hv/trailer-formatting' The logic to handle "trailer" related placeholders in the "--format=" mechanisms in the "log" family and "for-each-ref" family is getting unified. * hv/trailer-formatting: ref-filter: use pretty.c logic for trailers pretty.c: capture invalid trailer argument pretty.c: refactor trailer logic to `format_set_trailers_options()` t6300: use function to test trailer options	2021-03-01 14:02:58 -08:00
Junio C Hamano	18aabfaee5	Merge branch 'hn/reftable-tables-doc-update' Documentation update. * hn/reftable-tables-doc-update: doc/reftable: document how to handle windows	2021-03-01 14:02:57 -08:00
Junio C Hamano	fbad3505ee	Merge branch 'sv/t7001-modernize' Test script modernization. * sv/t7001-modernize: t7001: use `test` rather than `[` t7001: use here-docs instead of echo t7001: put each command on a separate line t7001: use '>' rather than 'touch' t7001: avoid using `cd` outside of subshells t7001: remove whitespace after redirect operators t7001: modernize subshell formatting t7001: remove unnecessary blank lines t7001: indent with TABs instead of spaces t7001: modernize test formatting	2021-03-01 14:02:57 -08:00
Junio C Hamano	6ee353d42f	Merge branch 'jt/transfer-fsck-across-packs' The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args	2021-03-01 14:02:57 -08:00
Junio C Hamano	660dd97a62	Merge branch 'ds/chunked-file-api' The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn	2021-03-01 14:02:57 -08:00
Junio C Hamano	12bd17521c	Merge branch 'en/diffcore-rename' Performance optimization work on the rename detection continues. * en/diffcore-rename: merge-ort: call diffcore_rename() directly gitdiffcore doc: mention new preliminary step for rename detection diffcore-rename: guide inexact rename detection based on basenames diffcore-rename: complete find_basename_matches() diffcore-rename: compute basenames of source and dest candidates t4001: add a test comparing basename similarity and content similarity diffcore-rename: filter rename_src list when possible diffcore-rename: no point trying to find a match better than exact	2021-03-01 14:02:56 -08:00
Junio C Hamano	700696bcfc	Merge branch 'jh/fsmonitor-prework' Preliminary changes to fsmonitor integration. * jh/fsmonitor-prework: fsmonitor: refactor initialization of fsmonitor_last_update token fsmonitor: allow all entries for a folder to be invalidated fsmonitor: log FSMN token when reading and writing the index fsmonitor: log invocation of FSMonitor hook to trace2 read-cache: log the number of scanned files to trace2 read-cache: log the number of lstat calls to trace2 preload-index: log the number of lstat calls to trace2 p7519: add trace logging during perf test p7519: move watchman cleanup earlier in the test p7519: fix watchman watch-list test on Windows p7519: do not rely on "xargs -d" in test	2021-03-01 14:02:56 -08:00
Junio C Hamano	90917373cd	Merge https://github.com/prati0100/git-gui * https://github.com/prati0100/git-gui: git-gui: remove lines starting with the comment character git-gui: fix typo in russian locale	2021-03-01 09:22:18 -08:00
Junio C Hamano	c0b27e3964	Merge branch 'js/commit-graph-warning' * js/commit-graph-warning: Revert "commit-graph: when incompatible with graphs, indicate why"	2021-03-01 09:21:24 -08:00
Junio C Hamano	cdc986a7c2	Revert "commit-graph: when incompatible with graphs, indicate why" This reverts commit `c85eec7fc3`, as it is a bit overzealous, we are in prerelease freeze, and we want to have enough time to get this right and cook in 'next'. cf. <8735xgkvuo.fsf@evledraar.gmail.com>	2021-03-01 09:19:37 -08:00
Jeff King	bbabaad298	config.mak.uname: enable OPEN_RETURNS_EINTR for macOS Big Sur We've had mixed reports on whether the latest release of macOS needs this Makefile knob set. In most reported cases, there's antivirus software running (which one might imagine could cause an open() call to be delayed). However, one of the (off-list) reports I've gotten indicated that it happened on an otherwise clean install of Big Sur. Since the symptom is so bad (checkout randomly fails to write several fails when the progress meter kicks in), and since the workaround is so lightweight (if we don't see EINTR, it's just an extra conditional check), let's just turn it on by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-01 09:07:45 -08:00
Jiang Xin	75f5efcba2	Merge branch 'master' of github.com:nafmo/git-l10n-sv * 'master' of github.com:nafmo/git-l10n-sv: l10n: sv.po: Update Swedish translation (5103t0f0u)	2021-03-01 10:01:02 +08:00
Jiang Xin	0b71d789a8	Merge branch 'pl' of github.com:Arusekk/git-po * 'pl' of github.com:Arusekk/git-po: l10n: pl.po: Update translation	2021-03-01 09:59:07 +08:00
Peter Krefting	fe8885258b	l10n: sv.po: Update Swedish translation (5103t0f0u) Signed-off-by: Peter Krefting <peter@softwolves.pp.se>	2021-02-28 22:22:46 +01:00
Arusekk	fa42d191c6	l10n: pl.po: Update translation Signed-off-by: Arusekk <arek_koz@o2.pl>	2021-02-27 17:17:32 +01:00
Jean-Noël Avila	5ff5a30652	l10n: fr: v2.31.0 rnd 1 Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>	2021-02-27 15:47:45 +01:00
Taylor Blau	66f52fa26b	pack-revindex.c: don't close unopened file descriptors When opening a reverse index, load_revindex_from_disk() jumps to the 'cleanup' label in case something goes wrong: the reverse index had the wrong size, an unrecognized version, or similar. It also jumps to this label when the reverse index couldn't be opened in the first place, which will cause an error with the unguarded close() call in the label. Guard this call with "if (fd >= 0)" to make sure that we have a valid file descriptor to close before attempting to close it. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 14:42:27 -08:00
Jeff King	2b08101204	Makefile: add OPEN_RETURNS_EINTR knob On some platforms, open() reportedly returns EINTR when opening regular files and we receive a signal (usually SIGALRM from our progress meter). This shouldn't happen, as open() should be a restartable syscall, and we specify SA_RESTART when setting up the alarm handler. So it may actually be a kernel or libc bug for this to happen. But it has been reported on at least one version of Linux (on a network filesystem): https://lore.kernel.org/git/c8061cce-71e4-17bd-a56a-a5fed93804da@neanderfunk.de/ as well as on macOS starting with Big Sur even on a regular filesystem. We can work around it by retrying open() calls that get EINTR, just as we do for read(), etc. Since we don't ever _want_ to interrupt an open() call, we can get away with just redefining open, rather than insisting all callsites use xopen(). We actually do have an xopen() wrapper already (and it even does this retry, though there's no indication of it being an observed problem back then; it seems simply to have been lifted from xread(), etc). But it is used hardly anywhere, and isn't suitable for general use because it will die() on error. In theory we could combine the two, but it's awkward to do so because of the variable-args interface of open(). This patch adds a Makefile knob for enabling the workaround. It's not enabled by default for any platforms in config.mak.uname yet, as we don't have enough data to decide how common this is (I have not been able to reproduce on either Linux or Big Sur myself). It may be worth enabling preemptively anyway, since the cost is pretty low (if we don't see an EINTR, it's just an extra conditional). However, note that we must not enable this on Windows. It doesn't do anything there, and the macro overrides the existing mingw_open() redirection. I've added a preemptive #undef here in the mingw header (which is processed first) to just quietly disable it (we could also make it an #error, but there is little point in being so aggressive). Reported-by: Aleksey Kliger <alklig@microsoft.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 14:15:51 -08:00
Jiang Xin	712b0ed6ec	l10n: git.pot: v2.31.0 round 1 (155 new, 89 removed) Generate po/git.pot from v2.31.0-rc0 for git v2.31.0 l10n round 1. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2021-02-26 22:09:42 +08:00
Junio C Hamano	225365fb51	Git 2.31-rc0 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-25 16:43:33 -08:00
Junio C Hamano	140045821a	Merge branch 'jc/push-delete-nothing' "git push $there --delete ''" should have been diagnosed as an error, but instead turned into a matching push, which has been corrected. * jc/push-delete-nothing: push: do not turn --delete '' into a matching push	2021-02-25 16:43:33 -08:00
Junio C Hamano	cadae717d5	Merge branch 'sh/mergetools-vimdiff1' Mergetools update. * sh/mergetools-vimdiff1: mergetools/vimdiff: add vimdiff1 merge tool variant	2021-02-25 16:43:32 -08:00
Junio C Hamano	09e72204f8	Merge branch 'dl/doc-config-camelcase' A handful of multi-word configuration variable names in documentation that are spelled in all lowercase have been corrected to use the more canonical camelCase. * dl/doc-config-camelcase: index-format doc: camelCase core.excludesFile blame-options.txt: camelcase blame.blankBoundary i18n.txt: camel case and monospace "i18n.commitEncoding"	2021-02-25 16:43:32 -08:00
Junio C Hamano	1c8f5dfa42	Merge branch 'js/params-vs-args' Messages update. * js/params-vs-args: replace "parameters" by "arguments" in error messages	2021-02-25 16:43:32 -08:00
Junio C Hamano	d228b6b231	Merge branch 'ug/doc-commit-approxidate' Doc update. * ug/doc-commit-approxidate: doc: mention approxidates for git-commit --date	2021-02-25 16:43:32 -08:00
Junio C Hamano	d166e8c1d4	Merge branch 'es/maintenance-of-bare-repositories' The "git maintenance register" command had trouble registering bare repositories, which had been corrected. * es/maintenance-of-bare-repositories: maintenance: fix incorrect `maintenance.repo` path with bare repository	2021-02-25 16:43:32 -08:00
Junio C Hamano	f277234860	Merge branch 'mt/add-chmod-fixes' Various fixes on "git add --chmod". * mt/add-chmod-fixes: add: propagate --chmod errors to exit status add: mark --chmod error string for translation add --chmod: don't update index when --dry-run is used	2021-02-25 16:43:31 -08:00
Junio C Hamano	48923e8356	Merge branch 'ds/merge-base-independent' The code to implement "git merge-base --independent" was poorly done and was kept from the very beginning of the feature. * ds/merge-base-independent: commit-reach: stale commits may prune generation further commit-reach: use heuristic in remove_redundant() commit-reach: move compare_commits_by_gen commit-reach: use one walk in remove_redundant() commit-reach: reduce requirements for remove_redundant()	2021-02-25 16:43:31 -08:00
Junio C Hamano	682bbad64d	Merge branch 'ah/rebase-no-fork-point-config' "git rebase --[no-]fork-point" gained a configuration variable rebase.forkPoint so that users do not have to keep specifying a non-default setting. * ah/rebase-no-fork-point-config: rebase: add a config option for --no-fork-point	2021-02-25 16:43:31 -08:00
Junio C Hamano	628c13ccee	Merge branch 'mt/grep-sparse-checkout' "git grep" has been tweaked to be limited to the sparse checkout paths. * mt/grep-sparse-checkout: grep: honor sparse-checkout on working tree searches	2021-02-25 16:43:31 -08:00
Junio C Hamano	3c8e6dda21	Merge branch 'ah/commit-graph-leakplug' Plug a minor memory leak. * ah/commit-graph-leakplug: commit-graph: avoid leaking topo_levels slab in write_commit_graph()	2021-02-25 16:43:31 -08:00
Junio C Hamano	6eea44cee1	Merge branch 'zh/difftool-skip-to' "git difftool" learned "--skip-to=<path>" option to restart an interrupted session from an arbitrary path. * zh/difftool-skip-to: difftool.c: learn a new way start at specified file	2021-02-25 16:43:31 -08:00
Junio C Hamano	ccf6861b72	Merge branch 'cw/pack-config-doc' Doc update. * cw/pack-config-doc: doc: mention bigFileThreshold for packing	2021-02-25 16:43:31 -08:00
Junio C Hamano	dddb420535	Merge branch 'jc/maint-column-doc-typofix' Doc update. * jc/maint-column-doc-typofix: Documentation: typofix --column description	2021-02-25 16:43:30 -08:00
Junio C Hamano	2638e33c82	Merge branch 'ma/doc-markup-fix' Docfix. * ma/doc-markup-fix: gitmailmap.txt: fix rendering of e-mail addresses git.txt: fix monospace rendering rev-list-options.txt: fix rendering of bonus paragraph	2021-02-25 16:43:30 -08:00
Junio C Hamano	845d6030f8	Merge branch 'jc/diffcore-rotate' "git {diff,log} --{skip,rotate}-to=<path>" allows the user to discard diff output for early paths or move them to the end of the output. * jc/diffcore-rotate: diff: --{rotate,skip}-to=<path>	2021-02-25 16:43:30 -08:00
Junio C Hamano	3da165ca28	Merge branch 'mt/checkout-index-corner-cases' The error codepath around the "--temp/--prefix" feature of "git checkout-index" has been improved. * mt/checkout-index-corner-cases: checkout-index: omit entries with no tempname from --temp output write_entry(): fix misuses of `path` in error messages	2021-02-25 16:43:30 -08:00
Junio C Hamano	f47c3328ef	Merge branch 'js/doc-proto-v2-response-end' Docfix. * js/doc-proto-v2-response-end: doc: fix naming of response-end-pkt	2021-02-25 16:43:30 -08:00
Junio C Hamano	18decfd11d	Merge branch 'rs/blame-optim' Optimization in "git blame" * rs/blame-optim: blame: remove unnecessary use of get_commit_info()	2021-02-25 16:43:29 -08:00
Junio C Hamano	d590ae5560	Merge branch 'mz/doc-notes-are-not-anchors' Objects that lost references can be pruned away, even when they have notes attached to it (and these notes will become dangling, which in turn can be pruned with "git notes prune"). This has been clarified in the documentation. * mz/doc-notes-are-not-anchors: docs: clarify that refs/notes/ do not keep the attached objects alive	2021-02-25 16:43:29 -08:00
Junio C Hamano	608cc4f273	Merge branch 'ab/detox-gettext-tests' Removal of GIT_TEST_GETTEXT_POISON continues. * ab/detox-gettext-tests: tests: remove most uses of test_i18ncmp tests: remove last uses of C_LOCALE_OUTPUT tests: remove most uses of C_LOCALE_OUTPUT tests: remove last uses of GIT_TEST_GETTEXT_POISON=false	2021-02-25 16:43:29 -08:00
Junio C Hamano	6fe12b5215	Merge branch 'jk/rev-list-disk-usage' "git rev-list" command learned "--disk-usage" option. * jk/rev-list-disk-usage: docs/rev-list: add some examples of --disk-usage docs/rev-list: add an examples section rev-list: add --disk-usage option for calculating disk usage t: add --no-tag option to test_commit	2021-02-25 16:43:29 -08:00
Junio C Hamano	7dd0eaa39c	index-format doc: camelCase core.excludesFile Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 15:21:25 -08:00
Junio C Hamano	edaf10dd26	blame-options.txt: camelcase blame.blankBoundary All other references to blame.* configuration variables are camelCased already. Update this one to match. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 15:21:25 -08:00
Denton Liu	77645b5daa	i18n.txt: camel case and monospace "i18n.commitEncoding" In `95791be750` (doc: camelCase the i18n config variables to improve readability, 2017-07-17), the other i18n config variables were camel cased. However, this one instance was missed. Camel case and monospace "i18n.commitEncoding" so that it matches the surrounding text. Signed-off-by: Denton Liu <liu.denton@gmail.com> [jc: fixed 3 other mistakes that are exactly the same] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 15:21:25 -08:00
Neeraj Singh	f279894d28	read-cache: make the index write buffer size 128K Writing an index 8K at a time invokes the OS filesystem and caching code very frequently, introducing noticeable overhead while writing large indexes. When experimenting with different write buffer sizes on Windows writing the Windows OS repo index (260MB), most of the benefit came by bumping the index write buffer size to 64K. I picked 128K to ensure that we're past the knee of the curve. With this change, the time under do_write_index for an index with 3M files goes from ~1.02s to ~0.72s. Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 13:40:30 -08:00
Matheus Tavares	9ebd7fe158	add: propagate --chmod errors to exit status If `add` encounters an error while applying the --chmod changes, it prints a message to stderr, but exits with a success code. This might have been an oversight, as the command does exit with a non-zero code in other situations where it cannot (or refuses to) update all of the requested paths (e.g. when some of the given paths are ignored). So make the exit behavior more consistent by also propagating --chmod errors to the exit status. Note: the test "all statuses changed in folder if . is given" uses paths added by previous test cases, some of which might be symbolic links. Because `git add --chmod` will now fail with such paths, this test would depend on whether all the previous tests were executed, or only some of them. Avoid that by running the test on a fresh repo with only regular files. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 12:14:51 -08:00
Matheus Tavares	48960894f5	add: mark --chmod error string for translation This error message is intended for humans, so mark it for translation. Also use error() instead of fprintf(stderr, ...), to make the corresponding line a bit cleaner, and to display the "error:" prefix, which helps classifying the nature/severity of the message. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 12:14:51 -08:00
Matheus Tavares	c937d70bfb	add --chmod: don't update index when --dry-run is used `git add --chmod` applies the mode changes even when `--dry-run` is used. Fix that and add some tests for this option combination. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 12:14:51 -08:00
Jeff Hostetler	6347d649bc	dir: fix malloc of root untracked_cache_dir Use FLEX_ALLOC_STR() to allocate the `struct untracked_cache_dir` for the root directory. Get rid of unsafe code that might fail to initialize the `name` field (if FLEX_ARRAY is not 1). This will make it clear that we intend to have a structure with an empty string following it. A problem was observed on Windows where the length of the memset() was too short, so the first byte of the name field was not zeroed. This resulted in the name field having garbage from a previous use of that area of memory. The record for the root directory was then written to the untracked-cache extension in the index. This garbage would then be visible to future commands when they reloaded the untracked-cache extension. Since the directory record for the root directory had garbage in the `name` field, the `t/helper/test-tool dump-untracked-cache` tool printed this garbage as the path prefix (rather than '/') for each directory in the untracked cache as it recursed. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 12:09:10 -08:00
Alex Henrie	2803d800d2	rebase: add a config option for --no-fork-point Some users (myself included) would prefer to have this feature off by default because it can silently drop commits. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 11:49:10 -08:00
Taylor Blau	c4ff24bbb3	commit-graph.c: display correct number of chunks when writing When writing a commit-graph, a progress meter is shown which indicates the number of pieces of data to write (one per commit in each chunk). In `47410aa837` (commit-graph: use chunk-format write API, 2021-02-18), the number of chunks became tracked by the new chunk-format API. But a stray local variable was left behind from when write_commit_graph_file() used to keep track of the same. Since this was no longer updated after `47410aa837`, the progress meter appeared broken: $ git commit-graph write --reachable Expanding reachable commits in commit graph: 837569, done. Writing out commit graph in 3 passes: 166% (4187845/2512707), done. Drop the local variable and rely instead on the chunk-format API to tell us the correct number of chunks. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 11:44:34 -08:00
Jordi Mas	a9926ecd54	l10n: Update Catalan translation Signed-off-by: Jordi Mas <jmas@softcatala.org>	2021-02-24 09:14:56 +01:00
Junio C Hamano	20e416409f	push: do not turn --delete '' into a matching push When we added a syntax sugar "git push remote --delete <ref>" to "git push" as a synonym to the canonical "git push remote :<ref>" syntax at `f517f1f2` (builtin-push: add --delete as syntactic sugar for :foo, 2009-12-30), we weren't careful enough to make sure that <ref> is not empty. Blindly rewriting "--delete <ref>" to ":<ref>" means that an empty string <ref> results in refspec ":", which is the syntax to ask for "matching" push that does not delete anything. Worse yet, if there were matching refs that can be fast-forwarded, they would have been published prematurely, even if the user feels that they are not ready yet to be pushed out, which would be a real disaster. Noticed-by: Tilman Vogel <tilman.vogel@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 15:19:34 -08:00
Jeff King	01168a9d89	doc: mention approxidates for git-commit --date We describe the more strict date formats accepted by GIT_COMMITTER_DATE, etc, but the --date option also allows the looser approxidate formats, as well. Unfortunately we don't have a good or complete reference for this format, but let's at least mention that it _is_ looser, and give a few examples. If we ever write separate, more complete date-format documentation, we should refer to it from here. Based-on-a-patch-by: Utku Gultopu <ugultopu@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 13:33:02 -08:00
Johannes Sixt	b865734760	replace "parameters" by "arguments" in error messages When an error message informs the user about an incorrect command invocation, it should refer to "arguments", not "parameters". Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 13:30:45 -08:00
Seth House	30bb8088af	mergetools/vimdiff: add vimdiff1 merge tool variant This adds yet another vimdiff/gvimdiff variant and presents conflicts as a two-way diff between 'LOCAL' and 'REMOTE'. 'MERGED' is not opened which deviates from the norm so usage text is echoed as a Vim message on startup that instructs the user with how to proceed and how to abort. Vimdiff is well-suited to two-way diffs so this is an option for a more simple, more streamlined conflict resolution. For example: it is difficult to communicate differences across more than two files using only syntax highlighting; default vimdiff commands to get and put changes between buffers do not need the user to manually specify a source or destination buffer when only using two buffers. Like other merge tools that directly compare 'LOCAL' with 'REMOTE', this tool will benefit when paired with the new `mergetool.hideResolved` setting. Signed-off-by: Seth House <seth@eseth.com> Tested-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 11:37:13 -08:00
Han-Wen Nienhuys	00f68732e5	doc/reftable: document how to handle windows On Windows we can't delete or overwrite files opened by other processes. Here we sketch how to handle this situation. We propose to use a random element in the filename. It's possible to design an alternate solution based on counters, but that would assign semantics to the filenames that complicates implementation. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 10:01:21 -08:00
Eric Sunshine	26c7974376	maintenance: fix incorrect `maintenance.repo` path with bare repository The periodic maintenance tasks configured by `git maintenance start` invoke `git for-each-repo` to run `git maintenance run` on each path specified by the multi-value global configuration variable `maintenance.repo`. Because `git for-each-repo` will likely be run outside of the repositories which require periodic maintenance, it is mandatory that the repository paths specified by `maintenance.repo` are absolute. Unfortunately, however, `git maintenance register` does nothing to ensure that the paths it assigns to `maintenance.repo` are indeed absolute, and may in fact -- especially in the case of a bare repository -- assign a relative path to `maintenance.repo` instead. Fix this problem by converting all paths to absolute before assigning them to `maintenance.repo`. While at it, also fix `git maintenance unregister` to convert paths to absolute, as well, in order to ensure that it can correctly remove from `maintenance.repo` a path assigned via `git maintenance register`. Reported-by: Clement Moyroud <clement.moyroud@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 00:22:45 -08:00
Junio C Hamano	966e671106	The tenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 16:12:43 -08:00
Junio C Hamano	d68fccef86	Merge branch 'ab/test-lib' Test framework clean-up. * ab/test-lib: test-lib-functions: assert correct parameter count test-lib-functions: remove bug-inducing "diagnostics" helper param test libs: rename "diff-lib" to "lib-diff" t/.gitattributes: sort lines test-lib-functions: move function to lib-bitmap.sh test libs: rename gitweb-lib.sh to lib-gitweb.sh test libs: rename bundle helper to "lib-bundle.sh" test-lib-functions: remove generate_zero_bytes() wrapper test-lib-functions: move test_set_index_version() to its user test lib: change "error" to "BUG" as appropriate test-lib: remove check_var_migration	2021-02-22 16:12:43 -08:00
Junio C Hamano	45df6c4d75	Merge branch 'ab/diff-deferred-free' A small memleak in "diff -I<regexp>" has been corrected. * ab/diff-deferred-free: diff: plug memory leak from regcomp() on {log,diff} -I diff: add an API for deferred freeing	2021-02-22 16:12:43 -08:00
Junio C Hamano	dcb11fc622	Merge branch 'ab/pager-exit-log' When a pager spawned by us exited, the trace log did not record its exit status correctly, which has been corrected. * ab/pager-exit-log: pager: properly log pager exit code when signalled run-command: add braces for "if" block in wait_or_whine() pager: test for exit code with and without SIGPIPE pager: refactor wait_for_pager() function	2021-02-22 16:12:43 -08:00
Junio C Hamano	dc24948be9	Merge branch 'ta/hash-function-transition-doc' Update formatting and grammar of the hash transition plan documentation, plus some updates. * ta/hash-function-transition-doc: doc: use https links doc hash-function-transition: move rationale upwards doc hash-function-transition: fix incomplete sentence doc hash-function-transition: use upper case consistently doc hash-function-transition: use SHA-1 and SHA-256 consistently doc hash-function-transition: fix asciidoc output	2021-02-22 16:12:43 -08:00
Junio C Hamano	15af6e6fee	Merge branch 'bc/signed-objects-with-both-hashes' Signed commits and tags now allow verification of objects, whose two object names (one in SHA-1, the other in SHA-256) are both signed. * bc/signed-objects-with-both-hashes: gpg-interface: remove other signature headers before verifying ref-filter: hoist signature parsing commit: allow parsing arbitrary buffers with headers gpg-interface: improve interface for parsing tags commit: ignore additional signatures when parsing signed commits ref-filter: switch some uses of unsigned long to size_t	2021-02-22 16:12:42 -08:00
Junio C Hamano	b9554c03a0	Merge branch 'dl/stash-cleanup' Documentation, code and test clean-up around "git stash". * dl/stash-cleanup: stash: declare ref_stash as an array t3905: use test_cmp() to check file contents t3905: replace test -s with test_file_not_empty t3905: remove nested git in command substitution t3905: move all commands into test cases t3905: remove spaces after redirect operators git-stash.txt: be explicit about subcommand options	2021-02-22 16:12:42 -08:00
Andrzej Hunt	bf4bb9f9f5	commit-graph: avoid leaking topo_levels slab in write_commit_graph() write_commit_graph initialises topo_levels using init_topo_level_slab(), next it calls compute_topological_levels() which can cause the slab to grow, we therefore need to clear the slab again using clear_topo_level_slab() when we're done. First introduced in `72a2bfca` (commit-graph: add a slab to store topological levels, 2021-01-16). LeakSanitizer output: ==1026==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x498ae9 in realloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0xafbed8 in xrealloc /src/git/wrapper.c:126:8 #2 0x7966d1 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1 #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1 #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12 #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2 #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6 #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11 #8 0x4cddb1 in run_builtin /src/git/git.c:453:11 #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3 #10 0x4cd084 in run_argv /src/git/git.c:771:4 #11 0x4ca424 in cmd_main /src/git/git.c:902:19 #12 0x707fb6 in main /src/git/common-main.c:52:11 #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f) Indirect leak of 524256 byte(s) in 1 object(s) allocated from: #0 0x498942 in calloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0xafc088 in xcalloc /src/git/wrapper.c:140:8 #2 0x796870 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1 #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1 #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12 #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2 #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6 #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11 #8 0x4cddb1 in run_builtin /src/git/git.c:453:11 #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3 #10 0x4cd084 in run_argv /src/git/git.c:771:4 #11 0x4ca424 in cmd_main /src/git/git.c:902:19 #12 0x707fb6 in main /src/git/common-main.c:52:11 #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f) SUMMARY: AddressSanitizer: 524264 byte(s) leaked in 2 allocation(s). Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:45:01 -08:00
ZheNing Hu	1c881026a1	difftool.c: learn a new way start at specified file `git difftool` only allow us to select file to view in turn. If there is a commit with many files and we exit in the middle, we will have to traverse list again to get the file diff which we want to see. Therefore,teach the command an option `--skip-to=<path>` to allow the user to say that diffs for earlier paths are not interesting (because they were already seen in an earlier session) and start this session with the named path. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:35:49 -08:00
Derrick Stolee	41f3c9949f	commit-reach: stale commits may prune generation further The remove_redundant_with_gen() algorithm performs a depth-first-search to find commits in the 'array' list, starting at the parents of each commit in 'array'. The result is that commits in 'array' are marked STALE when they are reachable from another commit in 'array'. This depth-first-search is fast when commits lie on or near the first-parent history of the higher commits. The search terminates early if all but one commit becomes marked STALE. However, it is possible that there are two independent commits with high generation number. In that case, the depth-first-search might languish by searching in lower generations due to the fixed min_generation used throughout the method. With the expectation that commits with lower generation are expected to become STALE more often, we can optimize further by increasing that min_generation boundary upon discovery of the commit with minimum generation. We must first sort the commits in 'array' by generation. We cannot sort 'array' itself since it must preserve relative order among the returned results (see revision.c:mark_redundant_parents() for an example). This simplifies the initialization of min_generation, but it also allows us to increase the new min_generation when we find the commit with smallest generation remaining. This requires more than two commits in order to test, so I used the Linux kernel repository with a few commits that are slightly off of the first-parent history. I timed the following command: git merge-base --independent 2ecedd756908 d2360a398f0b \ 1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e The first two commits have similar generation and are near the v5.10 tag. Commit 160bab43419e is off of the first-parent history behind v5.5, while the others are scattered somewhere reachable from v5.9. This is designed to demonstrate the optimization, as that commit within v5.5 would normally cause a lot of extra commit walking. Since remove_redundant_with_alg() is called only when at least one of the input commits has a finite generation number, this algorithm is tested with a commit-graph generated starting at a number of different tags, the earliest being v5.5. commit-graph at v5.5: \| Method \| Time \| \|-----------------------+-------\| \| _no_gen() \| 864ms \| \| _with_gen() (before) \| 858ms \| \| _with_gen() (after) \| 810ms \| commit-graph at v5.7: \| Method \| Time \| \|-----------------------+-------\| \| _no_gen() \| 625ms \| \| _with_gen() (before) \| 572ms \| \| _with_gen() (after) \| 517ms \| commit-graph at v5.9: \| Method \| Time \| \|-----------------------+-------\| \| _no_gen() \| 268ms \| \| _with_gen() (before) \| 224ms \| \| _with_gen() (after) \| 202ms \| commit-graph at v5.10: \| Method \| Time \| \|-----------------------+-------\| \| _no_gen() \| 72ms \| \| _with_gen() (before) \| 37ms \| \| _with_gen() (after) \| 9ms \| Note that these are only modest improvements for the case where the two independent commits are not in the commit-graph (not until v5.10). All algorithms get faster as more commits are indexed, which is not a surprise. However, the cost of walking extra commits is more and more prevalent in relative terms as more commits are indexed. Finally, the last case allows us to jump to the minimum generation between the last two commits (that are actually independent) so we greatly reduce the cost in that case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:34:34 -08:00
Derrick Stolee	3677773371	commit-reach: use heuristic in remove_redundant() Reachability algorithms in commit-reach.c frequently benefit from using the first-parent history as a heuristic for satisfying reachability queries. The most obvious example was implemented in `4fbcca4e` (commit-reach: make can_all_from_reach... linear, 2018-07-20). Update the walk in remove_redundant() to use this same heuristic. Here, we are walking starting at the parents of the input commits. Sort those parents and walk from the highest generation to lower. Each time, use the heuristic of searching the first parent history before continuing to expand the walk. The order in which we explore the commits matters, so update compare_commits_by_gen to break generation number ties with commit date. This has no effect when the commits are in a commit-graph file with corrected commit dates computed, but it will assist when the commits are in the region "above" the commit-graph with "infinite" generation number. Note that we cannot shift to use compare_commits_by_gen_then_commit_date as the method prototype is different. We use compare_commits_by_gen for QSORT() as opposed to as a priority function. The important piece is to ensure we short-circuit the walk when we find that there is a single non-redundant commit. This happens frequently when looking for merge-bases or comparing several tags with 'git merge-base --independent'. Use a new count 'count_still_independent' and if that hits 1 we can stop walking. To update 'count_still_independent' properly, we add use of the RESULT flag on the input commits. Then we can detect when we reach one of these commits and decrease the count. We need to remove the RESULT flag at that moment because we might re-visit that commit when popping the stack. We use the STALE flag to mark parents that have been added to the new walk_start list, but we need to clear that flag before we start walking so those flags don't halt our depth-first-search walk. On my copy of the Linux kernel repository, the performance of 'git merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11 seconds. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:34:34 -08:00
Derrick Stolee	c8d693e1e6	commit-reach: move compare_commits_by_gen Move this earlier in the file so it can be used by more methods. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:34:34 -08:00
Derrick Stolee	fbc21e3fbb	commit-reach: use one walk in remove_redundant() The current implementation of remove_redundant() uses several calls to paint_down_to_common() to determine that commits are independent of each other. This leads to quadratic behavior when many inputs are passed to commands such as 'git merge-base'. For example, in the Linux kernel repository, I tested the performance by passing all tags: git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)") (Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do not point to commits.) Here is the performance improvement introduced by this change: Before: 16.4s After: 1.1s This performance improvement requires the commit-graph file to be present. We keep the old algorithm around as remove_redundant_no_gen() and use it when generation_numbers_enabled() is false. This is similar to other algorithms within commit-reach.c. The new algorithm is implemented in remove_redundant_with_gen(). The basic approach is to do one commit walk instead of many. First, scan all commits in the list and mark their _parents_ with the STALE flag. This flag will indicate commits that are reachable from one of the inputs, except not including themselves. Then, walk commits until covering all commits up to the minimum generation number pushing the STALE flag throughout. At the end, we need to clear the STALE bit from all of the commits we walked. We move the non-stale commits in 'array' to the beginning of the list, and this might overwrite stale commits. However, we store an array of commits that started the walk, and use clear_commit_marks() on each of those starting commits. That method will walk the reachable commits with the STALE bit and clear them all. This makes the algorithm safe for re-entry or for other uses of those commits after this walk. This logic is covered by tests in t6600-test-reach.sh, so the behavior does not change. This is tested both in the case with a commit-graph and without. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:34:34 -08:00
Christian Walther	3a837b58e3	doc: mention bigFileThreshold for packing Knowing about the core.bigFileThreshold configuration variable is helpful when examining pack file size differences between repositories. Add a reference to it to the manpages a user is likely to read in this situation. Capitalize CONFIGURATION for consistency with other pages having such a section. Signed-off-by: Christian Walther <cwalther@gmx.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:18:30 -08:00
Jonathan Tan	5476e1efde	fetch-pack: print and use dangling .gitmodules Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Jonathan Tan	b664e9ffa1	fetch-pack: with packfile URIs, use index-pack arg Unify the index-pack arguments used when processing the inline pack and when downloading packfiles referenced by URIs. This is done by teaching get_pack() to also store the index-pack arguments whenever at least one packfile URI is given, and then when processing the packfile URI(s), using the stored arguments. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Jonathan Tan	27e35ba6c6	http-fetch: allow custom index-pack args This is the next step in teaching fetch-pack to pass its index-pack arguments when processing packfiles referenced by URIs. The "--keep" in fetch-pack.c will be replaced with a full message in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Jonathan Tan	726b25a91b	http: allow custom index-pack args Currently, when fetching, packfiles referenced by URIs are run through index-pack without any arguments other than --stdin and --keep, no matter what arguments are used for the packfile that is inline in the fetch response. As a preparation for ensuring that all packs (whether inline or not) use the same index-pack arguments, teach the http subsystem to allow custom index-pack arguments. http-fetch has been updated to use the new API. For now, it passes --keep alone instead of --keep with a process ID, but this is only temporary because http-fetch itself will be taught to accept index-pack parameters (instead of using a hardcoded constant) in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Pratyush Yadav	b1056f60b6	Merge branch 'py/commit-comments' Use git-stripspace to remove comment lines from the commit message. Also use it to clean up whitespace instead of rolling our own logic. * py/commit-comments: git-gui: remove lines starting with the comment character	2021-02-22 20:19:53 +05:30
Junio C Hamano	1b5b8cf072	Documentation: typofix --column description `f4ed0af6` (Merge branch 'nd/columns', 2012-05-03) brought in three cut-and-pasted copies of malformatted descriptions. Let's fix them all the same way by marking the configuration variable names up as monospace just like the command line option `--column` is typeset. While we are at it, correct a missing space after the full stop that ends the sentence. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-19 19:36:47 -08:00
Derrick Stolee	a43a2e6c2a	chunk-format: add technical docs The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	5387fefadc	chunk-format: restore duplicate chunk checks Before refactoring into the chunk-format API, the commit-graph parsing logic included checks for duplicate chunks. It is unlikely that we would desire a chunk-based file format that allows duplicate chunk IDs in the table of contents, so add duplicate checks into read_table_of_contents(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	329fac3a36	midx: use 64-bit multiplication for chunk sizes When calculating the sizes of certain chunks, we should use 64-bit multiplication always. This allows us to properly predict the chunk sizes without risk of overflow. Other possible overflows were discovered by evaluating each multiplication in midx.c and ensuring that at least one side of the operator was of type size_t or off_t. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	6ab3b8b8b8	midx: use chunk-format read API Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). In particular, we can use the return value of pair_chunk() to generate an error when a required chunk is missing. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	2692c2f6fd	commit-graph: use chunk-format read API Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	5f0879f54b	chunk-format: create read chunk API Add the capability to read the table of contents, then pair the chunks with necessary logic using read_chunk_fn pointers. Callers will be added in future changes, but the typical outline will be: 1. initialize a 'struct chunkfile' with init_chunkfile(NULL). 2. call read_table_of_contents(). 3. for each chunk to parse, a. call pair_chunk() to assign a pointer with the chunk position, or b. call read_chunk() to run a callback on the chunk start and size. 4. call free_chunkfile() to clear the 'struct chunkfile' data. We are re-using the anonymous 'struct chunkfile' data, as it is internal to the chunk-format API. This gives it essentially two modes: write and read. If the same struct instance was used for both reads and writes, then there would be failures. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	63a8f0e9b9	midx: use chunk-format API in write_midx_internal() The chunk-format API allows writing the table of contents and all chunks using the anonymous 'struct chunkfile' type. We only need to convert our local chunk logic to this API for the multi-pack-index writes to share that logic with the commit-graph file writes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	c1442410d8	midx: drop chunk progress during write Most expensive operations in write_midx_internal() use the context struct's progress member, and these indicate the process of the expensive operations within the chunk writing methods. However, there is a competing progress struct that counts the progress over all chunks. This is not very helpful compared to the others, so drop it. This also reduces our barriers to combining the chunk writing code with chunk-format.c. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	0ccd713cb6	midx: return success/failure in chunk write methods Historically, the chunk-writing methods in midx.c have returned the amount of data written so the writer method could compare this with the table of contents. This presents with some interesting issues: 1. If a chunk writing method has a bug that miscalculates the written bytes, then we can satisfy the table of contents without actually writing the right amount of data to the hashfile. The commit-graph writing code checks the hashfile struct directly for a more robust verification. 2. There is no way for a chunk writing method to gracefully fail. Returning an int presents an opportunity to fail without a die(). 3. The current pattern doesn't match chunk_write_fn type exactly, so we cannot share code with commit-graph.c For these reasons, convert the midx chunk writer methods to return an 'int'. Since none of them fail at the moment, they all return 0. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	980f525c3c	midx: add num_large_offsets to write_midx_context In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t num_large_offsets" into the context. With this new data, write_midx_large_offsets() now matches the chunk_write_fn type. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	7a3ada1192	midx: add pack_perm to write_midx_context In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "uint32_t *pack_perm" and large_offsets_needed bit into the context. Update write_midx_object_offsets() to match chunk_write_fn. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	31bda9a237	midx: add entries to write_midx_context In an effort to align write_midx_internal() with the chunk-format API, continue to group necessary data into "struct write_midx_context". This change collects the "struct pack_midx_entry *entries" list and its count into the context. Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the context directly, as these are easy conversions with this new data. Only the callers of write_midx_object_offsets() and write_midx_large_offsets() are updated here, since additional data in the context before those methods can match chunk_write_fn. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	b4d941420b	midx: use context in write_midx_pack_names() In an effort to align the write_midx_internal() to use the chunk-format API, start converting chunk writing methods to match chunk_write_fn. The first case is to convert write_midx_pack_names() to take "void *data". We already have the necessary data in "struct write_midx_context", so this conversion is rather mechanical. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	577dc49696	midx: rename pack_info to write_midx_context In an effort to streamline our chunk-based file formats, align some of the code structure in write_midx_internal() to be similar to the patterns in write_commit_graph_file(). Specifically, let's create a "struct write_midx_context" that can be used as a data parameter to abstract function types. This change only renames "struct pack_info" to "struct write_midx_context" and the names of instances from "packs" to "ctx". In future changes, we will expand the data inside "struct write_midx_context" and align our chunk-writing method with the chunk-format API. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	47410aa837	commit-graph: use chunk-format write API The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	570df42610	chunk-format: create chunk format write API In anticipation of combining the logic from the commit-graph and multi-pack-index file formats, create a new chunk-format API. Use a 'struct chunkfile' pointer to keep track of data that has been registered for writes. This struct is anonymous outside of chunk-format.c to ensure no user attempts to interfere with the data. The next change will use this API in commit-graph.c, but the general approach is: 1. initialize the chunkfile with init_chunkfile(f). 2. add chunks in the intended writing order with add_chunk(). 3. write any header information to the hashfile f. 4. write the chunkfile data using write_chunkfile(). 5. free the chunkfile struct using free_chunkfile(). Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Martin Ågren	f89f46b704	gitmailmap.txt: fix rendering of e-mail addresses Both AsciiDoc and Asciidoctor are eager to pick up the e-mail addresses in this document and turn them into references at the bottom of the manpage / clickable links. We don't really need that for these dummy addresses. Spell "@" as "@" to make them not do this. In the open block, we can instead avoid this by indenting the contents, similar to the earlier blocks. Fix a backtick which should have been a single quote mark. With all the quoting that is going on around here, this mistake trips up the parsing and rendering quite a bit. Before this commit, we have the same failure mode with AsciiDoc 8.6.10 and Asciidoctor 1.5.5, and this change makes both of them happy. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 10:53:33 -08:00
Martin Ågren	83171ede22	git.txt: fix monospace rendering When we write `<name>`s with the "s" tucked on to the closing backtick, we end up rendering the backticks literally. Rephrase this sentence slightly to render this as monospace. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 10:53:33 -08:00
Pratyush Yadav	b9a43869c9	git-gui: remove lines starting with the comment character The comment character is specified by the config variable 'core.commentchar'. Any lines starting with this character is considered a comment and should not be included in the final commit message. Teach git-gui to filter out lines in the commit message that start with the comment character using git-stripspace. If the config is not set, '#' is taken as the default. Also add a message educating users about the comment character. Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>	2021-02-18 23:35:57 +05:30
Junio C Hamano	2283e0e9af	The ninth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 17:21:43 -08:00
Junio C Hamano	483e09e810	Merge branch 'ak/config-bad-bool-error' The error message given when a configuration variable that is expected to have a boolean value has been improved. * ak/config-bad-bool-error: config: improve error message for boolean config	2021-02-17 17:21:43 -08:00
Junio C Hamano	e68f62be8d	Merge branch 'js/reflog-expire-stale-fix' "git reflog expire --stale-fix" can be used to repair the reflog by removing entries that refer to objects that have been pruned away, but was not careful to tolerate missing objects. * js/reflog-expire-stale-fix: reflog expire --stale-fix: be generous about missing objects	2021-02-17 17:21:43 -08:00
Junio C Hamano	726b11d68a	Merge branch 'js/commit-graph-warning' When certain features (e.g. grafts) used in the repository are incompatible with the use of the commit-graph, we used to silently turned commit-graph off; we now tell the user what we are doing. * js/commit-graph-warning: commit-graph: when incompatible with graphs, indicate why	2021-02-17 17:21:42 -08:00
Junio C Hamano	e9b4c483c7	Merge branch 'ew/rev-parse-since-test' Test to make sure "git rev-parse one-thing one-thing" gives the same thing twice (when one-thing is --since=X). * ew/rev-parse-since-test: t1500: ensure current --since= behavior remains	2021-02-17 17:21:42 -08:00
Junio C Hamano	d494433d26	Merge branch 'ds/maintenance-pack-refs' "git maintenance" tool learned a new "pack-refs" maintenance task. * ds/maintenance-pack-refs: maintenance: incremental strategy runs pack-refs weekly maintenance: add pack-refs task	2021-02-17 17:21:42 -08:00
Junio C Hamano	fdf3a27ca9	Merge branch 'jx/t5411-unique-filenames' Avoid individual tests in t5411 from getting affected by each other by forcing them to use separate output files during the test. * jx/t5411-unique-filenames: t5411: refactor check of refs using test_cmp_refs t5411: use different out file to prevent overwriting	2021-02-17 17:21:42 -08:00
Junio C Hamano	9e634a91c8	Merge branch 'js/fsck-name-objects-fix' Fix "git fsck --name-objects" which apparently has not been used by anybody who is motivated enough to report breakage. * js/fsck-name-objects-fix: fsck --name-objects: be more careful parsing generation numbers t1450: robustify `remove_object()`	2021-02-17 17:21:42 -08:00
Junio C Hamano	9bdccbcda7	Merge branch 'jk/mailmap-only-at-root' The .mailmap is documented to be read only from the root level of a working tree, but a stray file in a bare repository also was read by accident, which has been corrected. * jk/mailmap-only-at-root: mailmap: only look for .mailmap in work tree	2021-02-17 17:21:42 -08:00
Junio C Hamano	f712632a51	Merge branch 'mt/grep-cached-untracked' "git grep --untracked" is meant to be "let's ALSO find in these files on the filesystem" when looking for matches in the working tree files, and does not make any sense if the primary search is done against the index, or the tree objects. The "--cached" and "--untracked" options have been marked as mutually incompatible. * mt/grep-cached-untracked: grep: error out if --untracked is used with --cached	2021-02-17 17:21:41 -08:00
Junio C Hamano	78a26cb720	Merge branch 'sh/mergetool-hideresolved' "git mergetool" feeds three versions (base, local and remote) of a conflicted path unmodified. The command learned to optionally prepare these files with unconflicted parts already resolved. * sh/mergetool-hideresolved: mergetool: add per-tool support and overrides for the hideResolved flag mergetool: break setup_tool out into separate initialization function mergetool: add hideResolved configuration	2021-02-17 17:21:41 -08:00
Junio C Hamano	aa2d3dbdf5	Merge branch 'jt/trace2-BUG' Even though invocations of "die()" were logged to the trace2 system, "BUG()"s were not, which has been corrected. * jt/trace2-BUG: usage: trace2 BUG() invocations	2021-02-17 17:21:41 -08:00
Junio C Hamano	dadc91ff0c	Merge branch 'js/range-diff-one-side-only' The "git range-diff" command learned "--(left\|right)-only" option to show only one side of the compared range. * js/range-diff-one-side-only: range-diff: offer --left-only/--right-only options range-diff: move the diffopt initialization down one layer range-diff: combine all options in a single data structure range-diff: simplify code spawning `git log` range-diff: libify the read_patches() function again range-diff: avoid leaking memory in two error code paths	2021-02-17 17:21:41 -08:00
Junio C Hamano	77348b0e6e	Merge branch 'js/range-diff-wo-dotdot' There are other ways than ".." for a single token to denote a "commit range", namely "<rev>^!" and "<rev>^-<n>", but "git range-diff" did not understand them. * js/range-diff-wo-dotdot: range-diff(docs): explain how to specify commit ranges range-diff/format-patch: handle commit ranges other than A..B range-diff/format-patch: refactor check for commit range	2021-02-17 17:21:41 -08:00
Junio C Hamano	69571dfe21	Merge branch 'jt/clone-unborn-head' "git clone" tries to locally check out the branch pointed at by HEAD of the remote repository after it is done, but the protocol did not convey the information necessary to do so when copying an empty repository. The protocol v2 learned how to do so. * jt/clone-unborn-head: clone: respect remote unborn HEAD connect, transport: encapsulate arg in struct ls-refs: report unborn targets of symrefs	2021-02-17 17:21:40 -08:00
Junio C Hamano	0871fb9af5	Merge branch 'mr/bisect-in-c-4' Piecemeal of rewrite of "git bisect" in C continues. * mr/bisect-in-c-4: bisect--helper: retire `--check-and-set-terms` subcommand bisect--helper: reimplement `bisect_skip` shell function in C bisect--helper: retire `--bisect-auto-next` subcommand bisect--helper: use `res` instead of return in BISECT_RESET case option bisect--helper: retire `--bisect-write` subcommand bisect--helper: reimplement `bisect_replay` shell function in C bisect--helper: reimplement `bisect_log` shell function in C	2021-02-17 17:21:40 -08:00
Junio C Hamano	5bd0b21bf7	Merge branch 'ds/commit-graph-genno-fix' Fix incremental update of commit-graph file around corrected commit date data. * ds/commit-graph-genno-fix: commit-graph: prepare commit graph commit-graph: be extra careful about mixed generations commit-graph: compute generations separately commit-graph: validate layers for generation data commit-graph: always parse before commit_graph_data_at() commit-graph: use repo_parse_commit	2021-02-17 17:21:40 -08:00
Junio C Hamano	8b4701ae4f	Merge branch 'ak/corrected-commit-date' The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters	2021-02-17 17:21:40 -08:00
Joey Salazar	9d336655ba	doc: fix naming of response-end-pkt Git Protocol version 2[1] defines 0002 as a Message Packet that indicates the end of a response for stateless connections. Change the naming of the 0002 Packet to 'Response End' to match the parsing introduced in Wireshark's MR !1922 for consistency. A subsequent MR in Wireshark will address additional mismatches. [1] kernel.org/pub/software/scm/git/docs/technical/protocol-v2.html [2] gitlab.com/wireshark/wireshark/-/merge_requests/1922 Signed-off-by: Joey Salazar <jgsal@protonmail.com> Reviewed-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 16:30:43 -08:00
Jeff King	a1db097e10	docs/rev-list: add some examples of --disk-usage It's not immediately obvious why --disk-usage might be a useful thing. These examples show off a few of the real-world cases I've used it for. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 16:25:29 -08:00
Jeff King	669b458755	docs/rev-list: add an examples section We currently don't show any examples of using git-rev-list at all. Let's add some pretty elementary examples. They likely seem obvious to anybody who has worked with the tool for a while, but my purpose here is two-fold: - they may be enlightening to people who haven't used the tool a lot to give a general flavor of how it is meant to be used - they can serve as a starting point for adding more interesting examples (we can do that without the basic ones, of course, but I think it makes sense to show off the building blocks) This set is far from exhaustive, but again, the purpose is to be a starting point for further additions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 16:22:13 -08:00
Martin Ågren	452d26448d	rev-list-options.txt: fix rendering of bonus paragraph In git-log(1) -- but not in git-shortlog(1) or git-rev-list(1) -- we include a bonus paragraph in the description of `--first-parent`. But we forgot to add a lone "+" for a list continuation, and we shouldn't be indenting this second paragraph. As a result, we get a different indentation and the `backticks` render literally. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 13:16:11 -08:00
Rafael Silva	8e16effe97	blame: remove unnecessary use of get_commit_info() When `git blame --color-by-age`, the determine_line_heat() is called to select how to color the output based on the commit's author date. It uses the get_commit_info() to parse the information into a `commit_info` structure, however, this is actually unnecessary because the determine_line_heat() caller also does the same. Instead, let's change the determine_line_heat() to take a `commit_info` structure and remove the internal call to get_commit_info() thus cleaning up and optimizing the code path. Enabling Git's trace2 API in order to record the execution time for every call to determine_line_heat() function: + trace2_region_enter("blame", "determine_line_heat", the_repository); determine_line_heat(ent, &default_color); + trace2_region_enter("blame", "determine_line_heat", the_repository); Then, running `git blame` for "kernel/fork.c" in linux.git and summing all the execution time for every call (around 1.3k calls) resulted in 2.6x faster execution (best out 3): git built from `328c109303` (The eighth batch, 2021-02-12) = 42ms git built from `328c109303` + this change = 16ms Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 11:04:17 -08:00
Jeff Hostetler	fcd19b09f8	fsmonitor: refactor initialization of fsmonitor_last_update token Isolate and document initialization of `istate->fsmonitor_last_update`. This field should contain a fsmonitor-specific opaque token, but we need to initialize it before we can actually talk to a fsmonitor process, so we create a generic default value. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:35 -08:00
Kevin Willford	ff03836b9d	fsmonitor: allow all entries for a folder to be invalidated Allow fsmonitor to report directory changes by reporting paths with a trailing slash. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:35 -08:00
Jeff Hostetler	29fbbf43a0	fsmonitor: log FSMN token when reading and writing the index Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:35 -08:00
Jeff Hostetler	940b94f35c	fsmonitor: log invocation of FSMonitor hook to trace2 Let's measure the time taken to request and receive FSMonitor data via the hook API and the size of the response. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	15268d12be	read-cache: log the number of scanned files to trace2 Report the number of files in the working directory that were read and their hashes verified in `refresh_index()`. FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	a98e0f2d31	read-cache: log the number of lstat calls to trace2 Report the total number of calls made to lstat() inside of refresh_index(). FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. This can be seen in `refresh_index()`. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	8c4b7503d0	preload-index: log the number of lstat calls to trace2 Report the total number of calls made to lstat() inside preload_index(). FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. This can be seen in `preload_index()`. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	4f2009dce2	p7519: add trace logging during perf test Add optional trace logging to allow us to better compare performance of various fsmonitor providers and compare results with non-fsmonitor runs. Currently, this includes Trace2 logging, but may be extended to include other trace targets, such as GIT_TRACE_FSMONITOR if desired. Using this logging helped me explain an odd behavior on MacOS where the kernel was dropping events and causing the hook to Watchman to timeout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	a7556c3bde	p7519: move watchman cleanup earlier in the test Shutdown Watchman after the Watchman-based tests and before the block of "no fsmonitor" tests. This helps ensure that Watchman cannot affect the test results for the other. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	0917763d67	p7519: fix watchman watch-list test on Windows Only use the final portion of the test trash directory file name when verifying that Watchman was started. On Windows and under the SDK, $GIT_WORKTREE is a cygwin-style path with forward slashes and a "/c/" drive name. However `watchman watch-list` reports a proper Windows-style pathname with drive letters and backslashes. This causes the grep to fail. Since we don't really care about the full pathname (and we really don't want to bother with normalizaing them), just see if the test-name portion of the path is found. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Jeff Hostetler	eb10e637cf	p7519: do not rely on "xargs -d" in test Convert the test to use a more portable method to update the mtime on a large number of files under version control. The Mac version of xargs does not support the "-d" option. Likewise, the "-0" and "--null" options are not portable. Furthermore, use `test-tool chmtime` rather than `touch` to update the mtime to ensure that it is actually updated (especially on file systems with only whole second resolution). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 17:14:34 -08:00
Matheus Tavares	3f7ba60350	checkout-index: omit entries with no tempname from --temp output With --temp (or --stage=all, which implies --temp), checkout-index writes a list to stdout associating temporary file names to the entries' names. But if it fails to write an entry, and the failure happens before even assigning a temporary filename to that entry, we get an odd output line. This can be seen when trying to check out a symlink whose blob is missing: $ missing_blob=$(git hash-object --stdin </dev/null) $ git update-index --add --cacheinfo 120000,$missing_blob,foo $ git checkout-index --temp foo error: unable to read sha1 file of foo (`e69de29bb2`) foo The 'TAB foo' line is not much useful and it might break scripts that expect the 'tempname TAB foo' output. So let's omit such entries from the stdout list (but leaving the error message on stderr). We could also consider omitting _all_ failed entries from the output list, but that's probably not a good idea as the associated tempfiles may have been created even when checkout failed, so scripts may want to use the output list for cleanup. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 11:27:18 -08:00
Matheus Tavares	9334ea8e92	write_entry(): fix misuses of `path` in error messages The variables `path` and `ce->name`, at write_entry(), usually have the same contents, but that's not the case when using a checkout prefix or writing to a tempfile. (In fact, `path` will be either empty or dirty when writing to a tempfile.) Therefore, these variables cannot be used interchangeably. In this sense, fix wrong uses of `path` in error messages where it should really be `ce->name`, and add some regression tests. (Note: there doesn't seem to be any misuse in the other way around.) Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 11:27:17 -08:00
Junio C Hamano	1eb4136ac2	diff: --{rotate,skip}-to=<path> In the implementation of "git difftool", there is a case where the user wants to start viewing the diffs at a specific path and continue on to the rest, optionally wrapping around to the beginning. Since it is somewhat cumbersome to implement such a feature as a post-processing step of "git diff" output, let's support it internally with two new options. - "git diff --rotate-to=C", when the resulting patch would show paths A B C D E without the option, would "rotate" the paths to shows patch to C D E A B instead. It is an error when there is no patch for C is shown. - "git diff --skip-to=C" would instead "skip" the paths before C, and shows patch to C D E. Again, it is an error when there is no patch for C is shown. - "git log [-p]" also accepts these two options, but it is not an error if there is no change to the specified path. Instead, the set of output paths are rotated or skipped to the specified path or the first path that sorts after the specified path. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 09:30:42 -08:00
Elijah Newren	f78cf97617	merge-ort: call diffcore_rename() directly We want to pass additional information to diffcore_rename() (or some variant thereof) without plumbing that extra information through diff_tree_oid() and diffcore_std(). Further, since we will need to gather additional special information related to diffs and are walking the trees anyway in collect_merge_info(), it seems odd to have diff_tree_oid()/diffcore_std() repeat those tree walks. And there may be times where we can avoid traversing into a subtree in collect_merge_info() (based on additional information at our disposal), that the basic diff logic would be unable to take advantage of. For all these reasons, just create the add and delete pairs ourself and then call diffcore_rename() directly. This change is primarily about enabling future optimizations; the advantage of avoiding extra tree traversals is small compared to the cost of rename detection, and the advantage of avoiding the extra tree traversals is somewhat offset by the extra time spent in collect_merge_info() collecting the additional data anyway. However... For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 13.294 s ± 0.103 s 12.775 s ± 0.062 s mega-renames: 187.248 s ± 0.882 s 188.754 s ± 0.284 s just-one-mega: 5.557 s ± 0.017 s 5.599 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	07c9a7fcb5	gitdiffcore doc: mention new preliminary step for rename detection The last few patches have introduced a new preliminary step when rename detection is on but both break detection and copy detection are off. Document this new step. While we're at it, add a testcase that checks the new behavior as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	bd24aa2f97	diffcore-rename: guide inexact rename detection based on basenames Make use of the new find_basename_matches() function added in the last two patches, to find renames more rapidly in cases where we can match up files based on basenames. As a quick reminder (see the last two commit messages for more details), this means for example that docs/extensions.txt and docs/config/extensions.txt are considered likely renames if there are no remaining 'extensions.txt' files elsewhere among the added and deleted files, and if a similarity check confirms they are similar, then they are marked as a rename without looking for a better similarity match among other files. This is a behavioral change, as covered in more detail in the previous commit message. We do not use this heuristic together with either break or copy detection. The point of break detection is to say that filename similarity does not imply file content similarity, and we only want to know about file content similarity. The point of copy detection is to use more resources to check for additional similarities, while this is an optimization that uses far less resources but which might also result in finding slightly fewer similarities. So the idea behind this optimization goes against both of those features, and will be turned off for both. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 13.815 s ± 0.062 s 13.294 s ± 0.103 s mega-renames: 1799.937 s ± 0.493 s 187.248 s ± 0.882 s just-one-mega: 51.289 s ± 0.019 s 5.557 s ± 0.017 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	da09f65127	diffcore-rename: complete find_basename_matches() It is not uncommon in real world repositories for the majority of file renames to not change the basename of the file; i.e. most "renames" are just a move of files into different directories. We can make use of this to avoid comparing all rename source candidates with all rename destination candidates, by first comparing sources to destinations with the same basenames. If two files with the same basename are sufficiently similar, we record the rename; if not, we include those files in the more exhaustive matrix comparison. This means we are adding a set of preliminary additional comparisons, but for each file we only compare it with at most one other file. For example, if there was a include/media/device.h that was deleted and a src/module/media/device.h that was added, and there are no other device.h files in the remaining sets of added and deleted files after exact rename detection, then these two files would be compared in the preliminary step. This commit does not yet actually employ this new optimization, it merely adds a function which can be used for this purpose. The next commit will do the necessary plumbing to make use of it. Note that this optimization might give us different results than without the optimization, because it's possible that despite files with the same basename being sufficiently similar to be considered a rename, there's an even better match between files without the same basename. I think that is okay for four reasons: (1) it's easy to explain to the users what happened if it does ever occur (or even for them to intuitively figure out), (2) as the next patch will show it provides such a large performance boost that it's worth the tradeoff, and (3) it's somewhat unlikely that despite having unique matching basenames that other files serve as better matches. Reason (4) takes a full paragraph to explain... If the previous three reasons aren't enough, consider what rename detection already does. Break detection is not the default, meaning that if files have the same _fullname_, then they are considered related even if they are 0% similar. In fact, in such a case, we don't even bother comparing the files to see if they are similar let alone comparing them to all other files to see what they are most similar to. Basically, we override content similarity based on sufficient filename similarity. Without the filename similarity (currently implemented as an exact match of filename), we swing the pendulum the opposite direction and say that filename similarity is irrelevant and compare a full N x M matrix of sources and destinations to find out which have the most similar contents. This optimization just adds another form of filename similarity comparison, but augments it with a file content similarity check as well. Basically, if two files have the same basename and are sufficiently similar to be considered a rename, mark them as such without comparing the two to all other rename candidates. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	a35df3371c	diffcore-rename: compute basenames of source and dest candidates We want to make use of unique basenames among remaining source and destination files to help inform rename detection, so that more likely pairings can be checked first. (src/moduleA/foo.txt and source/module/A/foo.txt are likely related if there are no other 'foo.txt' files among the remaining deleted and added files.) Add a new function, not yet used, which creates a map of the unique basenames within rename_src and another within rename_dst, together with the indices within rename_src/rename_dst where those basenames show up. Non-unique basenames still show up in the map, but have an invalid index (-1). This function was inspired by the fact that in real world repositories, files are often moved across directories without changing names. Here are some sample repositories and the percentage of their historical renames (as of early 2020) that preserved basenames: * linux: 76% * gcc: 64% * gecko: 79% * webkit: 89% These statistics alone don't prove that an optimization in this area will help or how much it will help, since there are also unpaired adds and deletes, restrictions on which basenames we consider, etc., but it certainly motivated the idea to try something in this area. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	f3845257a5	t4001: add a test comparing basename similarity and content similarity Add a simple test where a removed file is similar to two different added files; one of them has the same basename, and the other has a slightly higher content similarity. In the current test, content similarity is weighted higher than filename similarity. Subsequent commits will add a new rule that weighs a mixture of filename similarity and content similarity in a manner that will change the outcome of this testcase. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	829514c515	diffcore-rename: filter rename_src list when possible We have to look at each entry in rename_src a total of rename_dst_nr times. When we're not detecting copies, any exact renames or ignorable rename paths will just be skipped over. While checking that these can be skipped over is a relatively cheap check, it's still a waste of time to do that check more than once, let alone rename_dst_nr times. When rename_src_nr is a few thousand times bigger than the number of relevant sources (such as when cherry-picking a commit that only touched a handful of files, but from a side of history that has different names for some high level directories), this time can add up. First make an initial pass over the rename_src array and move all the relevant entries to the front, so that we can iterate over just those relevant entries. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.119 s ± 0.101 s 13.815 s ± 0.062 s mega-renames: 1802.044 s ± 0.828 s 1799.937 s ± 0.493 s just-one-mega: 51.391 s ± 0.028 s 51.289 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Hariom Verma	ee82a487f6	ref-filter: use pretty.c logic for trailers Now, ref-filter is using pretty.c logic for setting trailer options. New to ref-filter: :key=<K> - only show trailers with specified key. :valueonly[=val] - only show the value part. :separator=<SEP> - inserted between trailer lines. :key_value_separator=<SEP> - inserted between key and value in trailer lines Enhancement to existing options(now can take value and its optional): :only[=val] :unfold[=val] 'val' can be: true, on, yes or false, off, no. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 16:48:38 -08:00
Hariom Verma	636a0aeedf	pretty.c: capture invalid trailer argument As we would like to use this trailers logic in the ref-filter, it's nice to get an invalid trailer argument. This will allow us to print precise error message while using `format_set_trailers_options()` in ref-filter. For capturing the invalid argument, we changed the working of `format_set_trailers_options()` a little bit. Original logic does "break" and fell through in mainly 2 cases - 1. unknown/invalid argument 2. end of the arg string But now instead of "break", we capture invalid argument and return non-zero. And non-zero is handled by the caller. (We prepared the caller to handle non-zero in the previous commit). Capturing invalid arguments this way will also affects the working of current logic. As at the end of the arg string it will return non-zero. So in order to make things correct, introduced an additional conditional statement i.e if encounter ")", do 'break'. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 16:48:38 -08:00
Hariom Verma	90563aedca	pretty.c: refactor trailer logic to `format_set_trailers_options()` Refactored trailers formatting logic inside pretty.c to a new function `format_set_trailers_options()`. This new function returns the non-zero in case of unusual. The caller handles the non-zero by "goto trailers_out". This change will allow us to reuse the same logic in other places. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 16:48:38 -08:00
Hariom Verma	727331dce1	t6300: use function to test trailer options Add a function to test trailer options. This will make tests look cleaner, as well as will make it easier to add new tests for trailers in the future. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 16:48:38 -08:00
Junio C Hamano	328c109303	The eighth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 14:21:04 -08:00
Junio C Hamano	8b25dee615	Merge branch 'tb/precompose-prefix-too' When commands are started from a subdirectory, they may have to compare the path to the subdirectory (called prefix and found out from $(pwd)) with the tracked paths. On macOS, $(pwd) and readdir() yield decomposed path, while the tracked paths are usually normalized to the precomposed form, causing mismatch. This has been fixed by taking the same approach used to normalize the command line arguments. * tb/precompose-prefix-too: MacOS: precompose_argv_prefix()	2021-02-12 14:21:04 -08:00
Junio C Hamano	006c5f79be	Merge branch 'jk/complete-branch-force-delete' The command line completion (in contrib/) completed "git branch -d" with branch names, but "git branch -D" offered tagnames in addition, which has been corrected. "git branch -M" had the same problem. * jk/complete-branch-force-delete: doc/git-branch: fix awkward wording for "-c" completion: handle other variants of "branch -m" completion: treat "branch -D" the same way as "branch -d"	2021-02-12 14:21:04 -08:00
Junio C Hamano	60f8121940	Merge branch 'jv/upload-pack-filter-spec-quotefix' Fix in passing custom args from "git clone" to "upload-pack" on the other side. * jv/upload-pack-filter-spec-quotefix: t5544: clarify 'hook works with partial clone' test upload-pack.c: fix filter spec quoting bug	2021-02-12 14:21:04 -08:00
Junio C Hamano	3c12d0b885	Merge branch 'tb/pack-revindex-on-disk' Introduce an on-disk file to record revindex for packdata, which traditionally was always created on the fly and only in-core. * tb/pack-revindex-on-disk: t5325: check both on-disk and in-memory reverse index pack-revindex: ensure that on-disk reverse indexes are given precedence t: support GIT_TEST_WRITE_REV_INDEX t: prepare for GIT_TEST_WRITE_REV_INDEX Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' builtin/pack-objects.c: respect 'pack.writeReverseIndex' builtin/index-pack.c: write reverse indexes builtin/index-pack.c: allow stripping arbitrary extensions pack-write.c: prepare to write 'pack-.rev' files packfile: prepare for the existence of '.rev' files	2021-02-12 14:21:04 -08:00
Junio C Hamano	2c873f9791	Merge branch 'ab/tests-various-fixup' Various test updates. * ab/tests-various-fixup: rm tests: actually test for SIGPIPE in SIGPIPE test archive tests: use a cheaper "zipinfo -h" invocation to get header upload-pack tests: avoid a non-zero "grep" exit status git-svn tests: rewrite brittle tests to use "--[no-]merges". git svn mergeinfo tests: refactor "test -z" to use test_must_be_empty git svn mergeinfo tests: modernize redirection & quoting style cache-tree tests: explicitly test HEAD and index differences cache-tree tests: use a sub-shell with less indirection cache-tree tests: remove unused $2 parameter cache-tree tests: refactor for modern test style	2021-02-12 14:21:04 -08:00
Elijah Newren	f15eb7c1cf	diffcore-rename: no point trying to find a match better than exact diffcore_rename() had some code to avoid having destination paths that already had an exact rename detected from being re-checked for other renames. Source paths, however, were re-checked because we wanted to allow the possibility of detecting copies. But if copy detection isn't turned on, then this merely amounts to attempting to find a better-than-exact match, which naturally ends up being an expensive no-op. In particular, copy detection is never turned on by the merge machinery. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.263 s ± 0.053 s 14.119 s ± 0.101 s mega-renames: 5504.231 s ± 5.150 s 1802.044 s ± 0.828 s just-one-mega: 158.534 s ± 0.498 s 51.391 s ± 0.028 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 12:04:00 -08:00
Ævar Arnfjörð Bjarmason	e7884b353b	test-lib-functions: assert correct parameter count Add assertions of the correct parameter count of various functions, in particularly the wrappers for the shell "test" built-in. In an earlier commit we fixed a bug with an incorrect number of arguments being passed to "test_path_is_{file,missing}". Let's also guard other similar functions from the same sort of misuse. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 11:58:21 -08:00
Ævar Arnfjörð Bjarmason	45a2686441	test-lib-functions: remove bug-inducing "diagnostics" helper param Remove the optional "diagnostics" parameter of the test_path_is_{file,dir,missing} functions. We have a lot of uses of these functions, but the only legitimate use of the diagnostics parameter is from when the functions themselves were introduced in `2caf20c52b` (test-lib: user-friendly alternatives to test [-d\|-f\|-e], 2010-08-10). But as the the rest of this diff demonstrates its presence did more to silently introduce bugs in our tests. Fix such bugs in the tests added in `ae4e89e549` (gc: add --keep-largest-pack option, 2018-04-15), and `c04ba51739` (t6046: testcases checking whether updates can be skipped in a merge, 2018-04-19). Let's also assert that those functions are called with exactly one parameter, a follow-up commit will add similar asserts to other functions in test-lib-functions.sh that we didn't have existing misuse of. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 11:58:21 -08:00
Ævar Arnfjörð Bjarmason	ebd73f50c6	test libs: rename "diff-lib" to "lib-diff" Rename the "diff-lib" to "lib-diff". With this rename and preceding commits there is no remaining t/lib which doesn't follow the convention of being called t/lib-*. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 11:58:21 -08:00
Junio C Hamano	f011795891	Sync with maint	2021-02-11 13:58:52 -08:00
Junio C Hamano	d3a035b055	Merge branch 'en/merge-ort-perf' The "ort" merge strategy. * en/merge-ort-perf: merge-ort: begin performance work; instrument with trace2_region_* calls merge-ort: ignore the directory rename split conflict for now merge-ort: fix massive leak	2021-02-11 13:58:44 -08:00
Junio C Hamano	a21e27ef6b	Merge branch 'en/ort-directory-rename' ORT merge strategy learns to infer "renamed directory" while merging. * en/ort-directory-rename: merge-ort: fix a directory rename detection bug merge-ort: process_renames() now needs more defensiveness merge-ort: implement apply_directory_rename_modifications() merge-ort: add a new toplevel_dir field merge-ort: implement handle_path_level_conflicts() merge-ort: implement check_for_directory_rename() merge-ort: implement apply_dir_rename() and check_dir_renamed() merge-ort: implement compute_collisions() merge-ort: modify collect_renames() for directory rename handling merge-ort: implement handle_directory_level_conflicts() merge-ort: implement compute_rename_counts() merge-ort: copy get_renamed_dir_portion() from merge-recursive.c merge-ort: add outline of get_provisional_directory_renames() merge-ort: add outline for computing directory renames merge-ort: collect which directories are removed in dirs_removed merge-ort: initialize and free new directory rename data structures merge-ort: add new data structures for directory rename detection	2021-02-11 13:58:43 -08:00
Andrew Klotz	f276e2a469	config: improve error message for boolean config Currently invalid boolean config values return messages about 'bad numeric', which is slightly misleading when the error was due to a boolean value. We can improve the developer experience by returning a boolean error message when we know the value is neither a bool text or int. before with an invalid boolean value of `non-boolean`, its unclear what numeric is referring to: fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit now the error message mentions `non-boolean` is a bad boolean value: fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign' Signed-off-by: Andrew Klotz <agc.klotz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:44:55 -08:00
Shubham Verma	488acf15df	t7001: use `test` rather than `[` According to Documentation/CodingGuidelines, we should use "test" rather than "[ ... ]" in shell scripts, so let's replace the "[ ... ]" with "test" in the t7001 test script. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:17 -08:00
Shubham Verma	39252c833e	t7001: use here-docs instead of echo Change from old style to current style by taking advantage of here-docs instead of echo commands. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	5d683c3f4b	t7001: put each command on a separate line Modern practice is to avoid multiple commands per line, and instead place each command on its own line. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	d2ecddc981	t7001: use '>' rather than 'touch' Use `>` rather than `touch` to create an empty file when the timestamp isn't relevant to the test. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	368d278249	t7001: avoid using `cd` outside of subshells Avoid using `cd` outside of subshells since, if the test fails, there is no guarantee that the current working directory is the expected one, which may cause subsequent tests to run in the wrong directory. While at it, make some other tests more concise by replacing simple subshells with `git -C`. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	dd72154149	t7001: remove whitespace after redirect operators According to Documentation/CodingGuidelines, there should be no whitespace after redirect operators. So, we should remove these whitespaces after redirect operators. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	9bcaeb71a6	t7001: modernize subshell formatting Some test use an old style for formatting subshells: (command && ... Update them to the modern style: ( command && ... Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	9b46e9c9cc	t7001: remove unnecessary blank lines Some tests use a deprecated style in which there are unnecessary blank lines after the opening quote of the test body and before the closing quote. So we should remove these unnecessary blank lines. Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	a76d90670a	t7001: indent with TABs instead of spaces Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Shubham Verma	5712d62ccf	t7001: modernize test formatting Some tests in this script are formatted using a very old style: test_expect_success \ 'title' \ 'body line 1 && body line 2' Update the formatting to the modern style: test_expect_success 'title' ' body line 1 && body line 2 ' Signed-off-by: Shubham Verma <shubhunic@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:42:16 -08:00
Denton Liu	3e885f0277	stash: declare ref_stash as an array Save sizeof(const char *) bytes by declaring ref_stash as an array instead of having a redundant pointer to an array. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	8c2462d1fe	t3905: use test_cmp() to check file contents Modernize the script by doing file content comparisons using test_cmp() instead of `test x = "$(cat file)"`. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	27e25a8cbf	t3905: replace test -s with test_file_not_empty In order to modernize the test script, replace `test -s` with test_file_not_empty(), which provides better diagnostic output in the case of failure. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	389ece4022	t3905: remove nested git in command substitution If a git command in a nested command substitution fails, it will be silently ignored since only the return code of the outer command substitutions is reported. Factor out nested command substitutions so that the error codes of those commands are reported. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	bbaa45c3aa	t3905: move all commands into test cases In order to modernize the tests, move commands that currently run outside of test cases into a test case. Where possible, clean up files that are produced using test_when_finished() but in the case where files persist over multiple test cases, create a new test case to perform cleanup. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	32b7385e43	t3905: remove spaces after redirect operators For shell scripts, the usual convention is for there to be no space after redirection operators, (e.g. `>file`, not `> file`). Remove these spaces wherever they appear. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Denton Liu	d6ab8b1929	git-stash.txt: be explicit about subcommand options Currently, the options for the `list` and `show` subcommands are just listed as `<options>`. This seems to imply, from a cursory glance at the summary, that they take the stash options listed below. However, reading more carefully, we see that they take log options and diff options respectively. Make it more obvious that they take log and diff options by explicitly stating this in the subcommand summary. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 13:34:58 -08:00
Jeff King	16950f8384	rev-list: add --disk-usage option for calculating disk usage It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone). You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' Though note that the caveats from git-cat-file(1) apply here. We "blame" base objects more than their deltas, even though the relationship could easily be flipped. Still, it can be a useful rough measure. But one problem is that it's slow to run. Teaching rev-list to sum up the sizes can be much faster for two reasons: 1. It skips all of the piping of object names and sizes. 2. If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the oid_object_info() lookup entirely, and just ask the revindex for the on-disk size. This patch implements a --disk-usage option which produces the same answer in a fraction of the time. Here are some timings using a clone of torvalds/linux: [rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --no-object-names --all \| git cat-file --buffer --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' 1459938510 real 0m29.635s user 0m38.003s sys 0m1.093s [internal, no bitmaps] $ time git rev-list --disk-usage --objects --all 1459938510 real 0m31.262s user 0m30.885s sys 0m0.376s Even though the wall-clock time is slightly worse due to parallelism, notice the CPU savings between the two. We saved 21% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --no-object-names --all --use-bitmap-index \| git cat-file --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' 1459938510 real 0m6.244s user 0m8.452s sys 0m0.311s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] $ time git rev-list --disk-usage --objects --all --use-bitmap-index 1459938510 real 0m0.219s user 0m0.169s sys 0m0.049s then we get the same answer much faster. For "--all", that answer will correspond closely to "du objects/pack", of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:57:55 -08:00
Johannes Schindelin	c85eec7fc3	commit-graph: when incompatible with graphs, indicate why When `gc.writeCommitGraph = true`, it is possible that the commit-graph is _still_ not written: replace objects, grafts and shallow repositories are incompatible with the commit-graph feature. Under such circumstances, we need to indicate to the user why the commit-graph was not written instead of staying silent about it. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:33:01 -08:00
Johannes Schindelin	c809798b2a	reflog expire --stale-fix: be generous about missing objects Whenever a user runs `git reflog expire --stale-fix`, the most likely reason is that their repository is at least _somewhat_ corrupt. Which means that it is more than just possible that some objects are missing. If that is the case, that can currently let the command abort through the phase where it tries to mark all reachable objects. Instead of adding insult to injury, let's be gentle and continue as best as we can in such a scenario, simply by ignoring the missing objects and moving on. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:21:52 -08:00
Ævar Arnfjörð Bjarmason	c45dc9cf30	diff: plug memory leak from regcomp() on {log,diff} -I Fix a memory leak in `296d4a94e7` (diff: add -I<regex> that ignores matching changes, 2020-10-20) by freeing the memory it allocates in the newly introduced diff_free(). See the previous commit for details on that. This memory leak was intentionally introduced in `296d4a94e7`, see the discussion on a previous iteration of it in https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/ At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it. Let's retain the pattern for diff_free_file() and add a diff_free_ignore_regex(), even though (unlike "diff_free_file") we don't need to call it elsewhere. I think this'll make for more readable code than gradually accumulating a giant diff_free() function, sharing "int i" across unrelated code etc. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:21:07 -08:00
Ævar Arnfjörð Bjarmason	e900d494dc	diff: add an API for deferred freeing Add a diff_free() function to free anything we may have allocated in the "diff_options" struct, and the ability to make calling it a noop by setting "no_free" in "diff_options". This is required because when e.g. "git diff" is run we'll allocate things in that struct, use the diff machinery once, and then exit. But if we run e.g. "git log -p" we're going to re-use what we allocated across multiple diff_flush() calls, and only want to free things at the end. We've thus ended up with features like the recently added "diff -I"[1] where we'll leak memory. As it turns out it could have simply used the pattern established in `6ea57703f6` (log: prepare log/log-tree to reuse the diffopt.close_file attribute, 2016-06-22). Manually adding more such flags to things log_tree_commit() every time we need to allocate something would be tedious. Let's instead move that fclose() code it to a new diff_free(), in anticipation of freeing more things in that function in follow-up commits. Some functions such as log_tree_commit() need an idiom of optionally retaining a previous "no_free", as they may either free the memory themselves, or their caller may do so. I'm keeping that idiom in log_show_early() for good measure, even though I don't think it's currently called in this manner. It also gets passed an existing "struct rev_info", so future callers may want to set the "no_free" flag. This change is a bit hard to read because while the freeing pattern we're introducing isn't unusual, the "file" member is a special snowflake. We usually don't want to fclose() it. This is because "file" is usually stdout, in which case we don't want to fclose() it. We only want to opt-in to closing it when we e.g. open a file on the filesystem. Thus the opt-in "close_file" flag. So the API in general just needs a "no_free" flag to defer freeing, but the "file" member still needs its "close_file" flag. This is made more confusing because while refactoring this code we could replace some "close_file=0" with "no_free=1", whereas others need to set both flags. This is because there were some cases where an existing "close_file=0" meant "let's defer deallocation", and others where it meant "we don't want to close this file handle at all". 1. `296d4a94e7` (diff: add -I<regex> that ignores matching changes, 2020-10-20) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:21:05 -08:00
Ævar Arnfjörð Bjarmason	1108cea7f8	tests: remove most uses of test_i18ncmp As a follow-up to `d162b25f95` (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) remove most uses of test_i18ncmp via a simple s/test_i18ncmp/test_cmp/g search-replacement. I'm leaving t6300-for-each-ref.sh out due to a conflict with in-flight changes between "master" and "seen", as well as the prerequisite itself due to other changes between "master" and "next/seen" which add new test_i18ncmp uses. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:48:27 -08:00
Ævar Arnfjörð Bjarmason	b1e079807b	tests: remove last uses of C_LOCALE_OUTPUT Remove the last uses of the C_LOCALE_OUTPUT prerequisite as well as the prerequisite itself. This is a follow-up to `d162b25f95` (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20), as well as the preceding commit where we removed the simpler uses of C_LOCALE_OUTPUT. Here I'm slightly refactoring a test added in `21e5ad50fc` (safecrlf: Add mechanism to warn about irreversible crlf conversions, 2008-02-06), as well as getting rid of another "test_have_prereq C_LOCALE_OUTPUT" use. I'm not leaving the prerequisite itself in place for in-flight changes as there currently are none that introduce new tests that rely on it, and because C_LOCALE_OUTPUT is currently a noop on the master branch we likely won't have any new submissions that use it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:48:27 -08:00
Ævar Arnfjörð Bjarmason	a926c4b904	tests: remove most uses of C_LOCALE_OUTPUT As a follow-up to `d162b25f95` (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now always true C_LOCALE_OUTPUT prerequisite from those tests which declare it as an argument to test_expect_{success,failure}. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:48:26 -08:00
Ævar Arnfjörð Bjarmason	780aa0a21e	tests: remove last uses of GIT_TEST_GETTEXT_POISON=false Follow-up my `73c01d25fe` (tests: remove uses of GIT_TEST_GETTEXT_POISON=false, 2021-01-20) by removing the last uses of GIT_TEST_GETTEXT_POISON=*. These assignments were part of branch that was in-flight at the time of the gettext poison removal. See `466f94ec45` (Merge branch 'ab/detox-gettext-tests', 2021-02-10) and `c7d6d419b0` (Merge branch 'ab/mktag', 2021-01-25) for the merging of the two branches. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:48:26 -08:00
Martin von Zweigbergk	fa9ab027ba	docs: clarify that refs/notes/ do not keep the attached objects alive `git help gc` contains this snippet: "[...] it will keep [..] objects referenced by the index, remote-tracking branches, notes saved by git notes under refs/notes/" I had interpreted that as saying that the objects that notes were attached to are kept, but that is not the case. Let's clarify the documentation by moving out the part about git notes to a separate sentence. Signed-off-by: Martin von Zweigbergk <martinvonz@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:43:55 -08:00
brian m. carlson	9b27b49240	gpg-interface: remove other signature headers before verifying When we have a multiply signed commit, we need to remove the signature in the header before verifying the object, since the trailing signature will not be over both pieces of data. Do so, and verify that we validate the signature appropriately. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:35:42 -08:00
brian m. carlson	88bce0e24c	ref-filter: hoist signature parsing When we parse a signature in the ref-filter code, we continually increment the buffer pointer. Hoist the signature parsing above the blank line delimiting headers and body so we can find the signature when using a header to sign the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:35:42 -08:00
brian m. carlson	937032e14a	commit: allow parsing arbitrary buffers with headers Currently only commits are signed with headers. However, in the future, we'll also sign tags with headers as well. Let's refactor out a function called parse_buffer_signed_by_header which does exactly that. In addition, since we'll want to sign things other than commits this way, let's call the function sign_with_header instead of do_sign_commit. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:35:42 -08:00
brian m. carlson	482c119186	gpg-interface: improve interface for parsing tags We have a function which parses a buffer with a signature at the end, parse_signature, and this function is used for signed tags. However, we'll need to store values for multiple algorithms, and we'll do this by using a header for the non-default algorithm. Adjust the parse_signature interface to store the parsed data in two strbufs and turn the existing function into parse_signed_buffer. The latter is still used in places where we know we always have a signed buffer, such as push certs. Adjust all the callers to deal with this new interface. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:35:42 -08:00
Junio C Hamano	c6102b7585	Merge branch 'tb/ci-run-cocci-with-18.04' The version of Ubuntu Linux used by default at GitHub Actions CI has been updated to one that lack coccinelle; until it gets fixed, work it around by sticking to the previous release (18.04). * tb/ci-run-cocci-with-18.04: .github/workflows/main.yml: run static-analysis on bionic	2021-02-10 16:48:07 -08:00
Junio C Hamano	f9f2520108	The seventh batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 14:48:33 -08:00
Junio C Hamano	466f94ec45	Merge branch 'ab/detox-gettext-tests' Get rid of "GETTEXT_POISON" support altogether, which may or may not be controversial. * ab/detox-gettext-tests: tests: remove uses of GIT_TEST_GETTEXT_POISON=false tests: remove support for GIT_TEST_GETTEXT_POISON ci: remove GETTEXT_POISON jobs	2021-02-10 14:48:33 -08:00
Junio C Hamano	59ace284f3	Merge branch 'ab/grep-pcre-invalid-utf8' Update support for invalid UTF-8 in PCRE2. * ab/grep-pcre-invalid-utf8: grep/pcre2: better support invalid UTF-8 haystacks grep/pcre2 tests: don't rely on invalid UTF-8 data test	2021-02-10 14:48:33 -08:00
Junio C Hamano	0199c68d01	Merge branch 'ab/retire-pcre1' The support for deprecated PCRE1 library has been dropped. * ab/retire-pcre1: Remove support for v1 of the PCRE library config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag	2021-02-10 14:48:33 -08:00
Junio C Hamano	938ecaa42f	Merge branch 'jk/pretty-lazy-load-commit' Some pretty-format specifiers do not need the data in commit object (e.g. "%H"), but we were over-eager to load and parse it, which has been made even lazier. * jk/pretty-lazy-load-commit: pretty: lazy-load commit data when expanding user-format	2021-02-10 14:48:33 -08:00
Junio C Hamano	2f794620f5	Merge branch 'ds/more-index-cleanups' Cleaning various codepaths up. * ds/more-index-cleanups: t1092: test interesting sparse-checkout scenarios test-lib: test_region looks for trace2 regions sparse-checkout: load sparse-checkout patterns name-hash: use trace2 regions for init repository: add repo reference to index_state fsmonitor: de-duplicate BUG()s around dirty bits cache-tree: extract subtree_pos() cache-tree: simplify verify_cache() prototype cache-tree: clean up cache_tree_update()	2021-02-10 14:48:33 -08:00
Junio C Hamano	02fb21617e	Merge branch 'rs/worktree-list-verbose' `git worktree list` now annotates worktrees as prunable, shows locked and prunable attributes in --porcelain mode, and gained a --verbose option. * rs/worktree-list-verbose: worktree: teach `list` verbose mode worktree: teach `list` to annotate prunable worktree worktree: teach `list --porcelain` to annotate locked worktree t2402: ensure locked worktree is properly cleaned up worktree: teach worktree_lock_reason() to gently handle main worktree worktree: teach worktree to lazy-load "prunable" reason worktree: libify should_prune_worktree()	2021-02-10 14:48:32 -08:00
Junio C Hamano	7e94720c1e	Merge branch 'js/rebase-i-commit-cleanup-fix' When "git rebase -i" processes "fixup" insn, there is no reason to clean up the commit log message, but we did the usual stripspace processing. This has been corrected. * js/rebase-i-commit-cleanup-fix: rebase -i: do leave commit message intact in fixup! chains	2021-02-10 14:48:32 -08:00
Junio C Hamano	e5abed92f5	Merge branch 'jk/t0000-cleanups' Code clean-up. * jk/t0000-cleanups: t0000: consistently use single quotes for outer tests t0000: run cleaning test inside sub-test t0000: run prereq tests inside sub-test t0000: keep clean-up tests together	2021-02-10 14:48:32 -08:00
Junio C Hamano	04703f64be	Merge branch 'sg/t7800-difftool-robustify' Test fix. * sg/t7800-difftool-robustify: t7800-difftool: don't accidentally match tmp dirs	2021-02-10 14:48:32 -08:00
Junio C Hamano	c9f94ab4fa	Merge branch 'ab/lose-grep-debug' Lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths. * ab/lose-grep-debug: grep/log: remove hidden --debug and --grep-debug options	2021-02-10 14:48:31 -08:00
Junio C Hamano	9d5b1c06ac	Merge branch 'jk/use-oid-pos' Code clean-up to ensure our use of hashtables using object names as keys use the "struct object_id" objects, not the raw hash values. * jk/use-oid-pos: oid_pos(): access table through const pointers hash_pos(): convert to oid_pos() rerere: use strmap to store rerere directories rerere: tighten rr-cache dirname check rerere: check dirname format while iterating rr_cache directory commit_graft_pos(): take an oid instead of a bare hash	2021-02-10 14:48:31 -08:00
Eric Wong	a5cdca4520	t1500: ensure current --since= behavior remains This behavior of git-rev-parse is observed since git 1.8.3.1 at least(), and likely earlier versions. At least one git-reliant project in-the-wild relies on this current behavior of git-rev-parse being able to handle multiple --since= arguments without squeezing identical results together. So add a test to prevent the potential for regression in downstream projects. () 1.8.3.1 the version packaged for CentOS 7.x Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 14:24:13 -08:00
Ævar Arnfjörð Bjarmason	59934417ff	t/.gitattributes: sort lines Sort the lines starting with "/", the only out-of-place line was added along with most of the file in `614f4f0f35` (Fix the remaining tests that failed with core.autocrlf=true, 2017-05-09). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	ddfe900612	test-lib-functions: move function to lib-bitmap.sh Move a function added to test-lib-functions.sh in `ea047a8eb4` (t5310: factor out bitmap traversal comparison, 2020-02-14) into a new lib-bitmap.sh. The test-lib-functions.sh file should be for functions that are widely used across the test suite, if something's only used by a few tests it makes more sense to have it in a lib-*.sh file. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	3fca1fc651	test libs: rename gitweb-lib.sh to lib-gitweb.sh Rename gitweb-lib.sh to lib-gitweb.sh for consistency with other test library files. When it was introduced in `05526071cb` (gitweb: split test suite into library and tests, 2009-08-27) this naming pattern was more common. Since then all but one other such library which didn't start with "lib-.sh" such as t6000lib.sh has been been renamed, see e.g. `9d488eb40e` (Move t6000lib.sh to lib-, 2010-05-07). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	e8a8e7ff98	test libs: rename bundle helper to "lib-bundle.sh" Rename the recently introduced test-bundle-functions.sh to be consistent with other lib-.sh files, which is the convention for these sorts of shared test library functions. The new test-bundle-functions.sh was introduced in `9901164d81` (test: add helper functions for git-bundle, 2021-01-11). It was the only test-.sh of this nature. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	f3ad2bf471	test-lib-functions: remove generate_zero_bytes() wrapper Since `d5cfd142ec` (tests: teach the test-tool to generate NUL bytes and use it, 2019-02-14) the generate_zero_bytes() functions has been a thin wrapper for "test-tool genzeros". Let's have its only user call that directly instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	762ccf9906	test-lib-functions: move test_set_index_version() to its user Move the test_set_index_version() function to its only user. This function has only been used in one place since its addition in `5d9fc888b4` (test-lib: allow setting the index format version, 2014-02-23). Let's have that test script define it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	9e9c7dd6f1	test lib: change "error" to "BUG" as appropriate Change two uses of "error" in test-lib-functions.sh to "BUG". In the first instance in "test_cmp_rev" the author of the "BUG" function added in [1] had another in-flight patch adding this in [2], and the two were never consolidated. In the second case in "test_atexit" added in [3] that we could have instead used "BUG" appears to have been missed. 1. `165293af3c` (tests: send "bug in the test script" errors to the script's stderr, 2018-11-19) 2. `30d0b6dccb` (test-lib-functions: make 'test_cmp_rev' more informative on failure, 2018-11-19) 3. `900721e15c` (test-lib: introduce 'test_atexit', 2019-03-13) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Ævar Arnfjörð Bjarmason	c0eedbc009	test-lib: remove check_var_migration Remove the check_var_migration() migration helper. This was added back in [1], [2] and [3] to warn users to migrate from e.g. the "GIT_FSMONITOR_TEST" name to "GIT_TEST_FSMONITOR". I daresay that having been warning about this since late 2018 (or v2.20.0) was sufficient time to give everyone interested a heads-up about moving to the new names. I don't see the need for going through the "do this later" codepath anticipated in [1], let's just remove this instead. 1. `4cb54d0aa8` (fsmonitor: update GIT_TEST_FSMONITOR support, 2018-09-18) 2. `1f357b045b` (read-cache: update TEST_GIT_INDEX_VERSION support, 2018-09-18) 3. `5765d97b71` (preload-index: update GIT_FORCE_PRELOAD_TEST support, 2018-09-18) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:54:34 -08:00
Jeff King	a38cb9878a	mailmap: only look for .mailmap in work tree When trying to find a .mailmap file, we will always look for it in the current directory. This makes sense in a repository with a working tree, since we'd always go to the toplevel directory at startup. But for a bare repository, it can be confusing. With an option like --git-dir (or $GIT_DIR in the environment), we don't chdir at all, and we'd read .mailmap from whatever directory you happened to be in before starting Git. (Note that --git-dir without specifying a working tree historically means "the current directory is the root of the working tree", but most bare repositories will have core.bare set these days, meaning they will realize there is no working tree at all). The documentation for gitmailmap(5) says: If the file `.mailmap` exists at the toplevel of the repository[...] which likewise reinforces the notion that we are looking in the working tree. This patch prevents us from looking for such a file when we're in a bare repository. This does break something that used to work: cd bare.git git cat-file blob HEAD:.mailmap >.mailmap git shortlog But that was never advertised in the documentation. And these days we have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing in a much cleaner way. However, there's one more interesting case: we might not have a repository at all! The git-shortlog command can be run with git-log output fed on its stdin, and it will apply the mailmap. In that case, it probably does make sense to read .mailmap from the current directory. This patch will continue to do so. That leads to one even weirder case: if you run git-shortlog to process stdin, the input _could_ be from a different repository entirely. Should we respect the in-tree .mailmap then? Probably yes. Whatever the source of the input, if shortlog is running in a repository, the documentation claims that we'd read the .mailmap from its top-level (and of course it's reasonably likely that it _is_ from the same repo, and the user just preferred to run git-log and git-shortlog separately for whatever reason). The included test covers these cases, and we now document the "no repo" case explicitly. We also add a test that confirms we find a top-level ".mailmap" even when we start in a subdirectory of the working tree. This worked both before and after this commit, but we never tested it explicitly (it works because we always chdir to the top-level of the working tree if there is one). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 13:34:51 -08:00
Johannes Schindelin	e89f89361c	fsck --name-objects: be more careful parsing generation numbers In `7b35efd734` (fsck_walk(): optionally name objects on the go, 2016-07-17), the `fsck` machinery learned to optionally name the objects, so that it is easier to see what part of the repository is in a bad shape, say, when objects are missing. To save on complexity, this machinery uses a parser to determine the name of a parent given a commit's name: any `~<n>` suffix is parsed and the parent's name is formed from the prefix together with `~<n+1>`. However, this parser has a bug: if it finds a suffix `<n>` that is _not_ `~<n>`, it will mistake the empty string for the prefix and `<n>` for the generation number. In other words, it will generate a name of the form `~<bogus-number>`. Let's fix this. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 12:38:05 -08:00
Johannes Schindelin	8c891eed3a	t1450: robustify `remove_object()` This function can be simplified by using the `test_oid_to_path()` helper, which incidentally also makes it more robust by not relying on the exact file system layout of the loose object files. While at it, do not define those functions in a test case, it buys us nothing. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 12:38:00 -08:00
Matheus Tavares	42d906bec4	grep: honor sparse-checkout on working tree searches On a sparse checked out repository, `git grep` (without --cached) ends up searching the cache when an entry matches the search pathspec and has the SKIP_WORKTREE bit set. This is confusing both because the sparse paths are not expected to be in a working tree search (as they are not checked out), and because the output mixes working tree and cache results without distinguishing them. (Note that grep also resorts to the cache on working tree searches that include --assume-unchanged paths. But the whole point in that case is to assume that the contents of the index entry and the file are the same. This does not apply to the case of sparse paths, where the file isn't even expected to be present.) Fix that by teaching grep to honor the sparse-checkout rules for working tree searches. If the user wants to grep paths outside the current sparse-checkout definition, they may either update the sparsity rules to materialize the files, or use --cached to search all blobs registered in the index. Note: it might also be interesting to add a configuration option that allow users to search paths that are present despite having the SKIP_WORKTREE bit set, and/or to restrict searches in the index and past revisions too. These ideas are left as future improvements to avoid conflicting with other sparse-checkout topics currently in flight. Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 23:10:51 -08:00
Derrick Stolee	acc1c4d5d4	maintenance: incremental strategy runs pack-refs weekly When the 'maintenance.strategy' config option is set to 'incremental', a default maintenance schedule is enabled. Add the 'pack-refs' task to that strategy at the weekly cadence. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 23:09:29 -08:00
Derrick Stolee	41abfe15d9	maintenance: add pack-refs task It is valuable to collect loose refs into a more compressed form. This is typically the packed-refs file, although this could be the reftable in the future. Having packed refs can be extremely valuable in repos with many tags or remote branches that are not modified by the local user, but still are necessary for other queries. For instance, with many exploded refs, commands such as git describe --tags --exact-match HEAD can be very slow (multiple seconds). This command in particular is used by terminal prompts to show when a detatched HEAD is pointing to an existing tag, so having it be slow causes significant delays for users. Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all --prune' to move loose refs into a packed form. For now, that is the packed-refs file, but could adjust to other file formats in the future. This is the first of several sub-tasks of the 'gc' task that could be extracted to their own tasks. In this process, we should not change the behavior of the 'gc' task since that remains the default way to keep repositories maintained. Creating a new task for one of these sub-tasks only provides more customization options for those choosing to not use the 'gc' task. It is certainly possible to have both the 'gc' and 'pack-refs' tasks enabled and run regularly. While they may repeat effort, they do not conflict in a destructive way. The 'auto_condition' function pointer is left NULL for now. We could extend this in the future to have a condition check if pack-refs should be run during 'git maintenance run --auto'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 23:09:24 -08:00
Jonathan Tan	0a9dde4a04	usage: trace2 BUG() invocations die() messages are traced in trace2, but BUG() messages are not. Anyone tracking die() messages would have even more reason to track BUG(). Therefore, write to trace2 when BUG() is invoked. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 14:14:34 -08:00
Seth House	9d9cf23031	mergetool: add per-tool support and overrides for the hideResolved flag Add a per-tool override flag so that users may enable the flag for one tool and disable it for another by setting `mergetool.<tool>.hideResolved` to `false`. In addition, the author or maintainer of a mergetool may optionally override the default `hideResolved` value for that mergetool. If the `mergetools/<tool>` shell script contains a `hide_resolved_enabled` function it will be called when the mergetool is invoked and the return value will be used as the default for the `hideResolved` flag. hide_resolved_enabled () { return 1 } Disabling may be desirable if the mergetool wants or needs access to the original, unmodified 'LOCAL' and 'REMOTE' versions of the conflicted file. For example: - A tool may use a custom conflict resolution algorithm and prefer to ignore the results of Git's conflict resolution. - A tool may want to visually compare/constrast the version of the file from before the merge (saved to 'LOCAL', 'REMOTE', and 'BASE') with Git's conflict resolution results (saved to 'MERGED'). Helped-by: Johannes Sixt <j6t@kdbg.org> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Seth House <seth@eseth.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 14:09:16 -08:00
Seth House	de8dafbada	mergetool: break setup_tool out into separate initialization function This is preparation for the following commit where we need to source the mergetool shell script to look for overrides before `run_merge_tool` is called. Previously `run_merge_tool` both sourced that script and invoked the mergetool. In the case of the following commit, we need the result of the `hide_resolved` override, if present, before we actually run `run_merge_tool`. The new `initialize_merge_tool` wrapper is exposed and documented as a public interface for consistency with the existing `run_merge_tool` which is also public. Although `setup_tool` could instead be exposed directly, the related `setup_user_tool` would probably also want to be elevated to match and this felt the cleanest to me. Signed-off-by: Seth House <seth@eseth.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 14:09:16 -08:00
Seth House	98ea309b3f	mergetool: add hideResolved configuration The purpose of a mergetool is to help the user resolve any conflicts that Git cannot automatically resolve. If there is a conflict that must be resolved manually Git will write a file named MERGED which contains everything Git was able to resolve by itself and also everything that it was not able to resolve wrapped in conflict markers. One way to think of MERGED is as a two- or three-way diff. If each "side" of the conflict markers is separately extracted an external tool can represent those conflicts as a side-by-side diff. However many mergetools instead diff LOCAL and REMOTE both of which contain versions of the file from before the merge. Since the conflicts Git resolved automatically are not present it forces the user to manually re-resolve those conflicts. Some mergetools also show MERGED but often only for reference and not as the focal point to resolve the conflicts. This adds a `mergetool.hideResolved` flag that will overwrite LOCAL and REMOTE with each corresponding "side" of a conflicted file and thus hide all conflicts that Git was able to resolve itself. Overwriting these files will immediately benefit any mergetool that uses them without requiring any changes to the tool. No adverse effects were noted in a small survey of popular mergetools[1] so this behavior defaults to `true`. However it can be globally disabled by setting `mergetool.hideResolved` to `false`. [1] https://www.eseth.org/2020/mergetools.html `c884424769/2020/mergetools.md` Original-implementation-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Seth House <seth@eseth.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 14:09:16 -08:00
Jeff King	3803a3a099	t: add --no-tag option to test_commit One of the conveniences that test_commit offers is making a tag for each commit. This makes it easy to refer to the commits in subsequent commands. But it can also be a pain if you care about reachability, because those tags keep the commits reachable even if they are rewound from the branch they're made on. The alternative is that scripts have to call test_tick, git-add, and git-commit themselves. Let's add a --no-tag option to give them the one-liner convenience of using test_commit. This is in preparation for the next patch, which will add some more calls. But I cleaned up an existing site to show off the feature. There are probably more cleanups possible. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 13:36:06 -08:00
Matheus Tavares	0c5d83b248	grep: error out if --untracked is used with --cached The options --untracked and --cached are not compatible, but if they are used together, grep just silently ignores --cached and searches the working tree. Error out, instead, to avoid any potential confusion. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 12:39:06 -08:00
Junio C Hamano	1d4f2316c5	Sync with 2.30.1	2021-02-08 14:44:42 -08:00
Johannes Schindelin	2cc543deab	range-diff(docs): explain how to specify commit ranges There are three forms, depending whether the user specifies one, two or three non-option arguments. We've never actually explained how this works in the manual, so let's explain it. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-06 21:24:55 -08:00
Johannes Schindelin	359f0d754a	range-diff/format-patch: handle commit ranges other than A..B In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are described to specify commit ranges that `range-diff` does not yet accept: "<commit>^!" and "<commit>^-<n>". Let's accept them, by parsing them via the revision machinery and looking for at least one interesting and one uninteresting revision in the resulting `pending` array. This also finally lets us reject arguments that _do_ contain `..` but are not actually ranges, e.g. `HEAD^{/do.. match this}`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-06 21:24:55 -08:00
Johannes Schindelin	1e79f97326	range-diff: offer --left-only/--right-only options When comparing commit ranges, one is frequently interested only in one side, such as asking the question "Has this patch that I submitted to the Git mailing list been applied?": one would only care about the part of the output that corresponds to the commits in a local branch. To make that possible, imitate the `git rev-list` options `--left-only` and `--right-only`. This addresses https://github.com/gitgitgadget/git/issues/206 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-06 21:14:31 -08:00
Johannes Schindelin	3e6046edad	range-diff: move the diffopt initialization down one layer It is actually only the `output()` function that uses those diffopts. By moving the diffopt initialization down into that function, it is encapsulated better. Incidentally, it will also make it easier to implement the `--left-only` and `--right-only` options in `git range-diff` because the `output()` function is now receiving all range-diff options as a parameter, not just the diffopts. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-06 21:14:31 -08:00
Johannes Schindelin	f1ce6c191e	range-diff: combine all options in a single data structure This will make it easier to implement the `--left-only` and `--right-only` options. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-06 21:14:31 -08:00
Junio C Hamano	fb7fa4a1fd	Sync with maint	2021-02-05 16:41:17 -08:00
Junio C Hamano	4527ecdc8d	The sixth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 16:40:46 -08:00
Junio C Hamano	4513f6bbb1	Merge branch 'sg/test-stress-jobs' Test framework fix. * sg/test-stress-jobs: test-lib: prevent '--stress-jobs=X' from being ignored	2021-02-05 16:40:46 -08:00
Junio C Hamano	dfc3c2b224	Merge branch 'jk/weather-balloon-require-variadic-macro' We've carried compatibility codepaths for compilers without variadic macros for quite some time, but the world may be ready for them to be removed. Force compilation failure on exotic platforms where variadic macros are not available to find out who screams in such a way that we can easily revert if it turns out that the world is not yet ready. * jk/weather-balloon-require-variadic-macro: git-compat-util: always enable variadic macros	2021-02-05 16:40:46 -08:00
Junio C Hamano	b6c90a2a22	Merge branch 'pb/ci-matrix-wo-shortcut' Our setting of GitHub CI test jobs were a bit too eager to give up once there is even one failure found. Tweak the knob to allow other jobs keep running even when we see a failure, so that we can find more failures in a single run. * pb/ci-matrix-wo-shortcut: ci: do not cancel all jobs of a matrix if one fails	2021-02-05 16:40:46 -08:00
Junio C Hamano	61b159e219	Merge branch 'pb/blame-funcname-range-userdiff' Test fix. * pb/blame-funcname-range-userdiff: annotate-tests: quote variable expansions containing path names	2021-02-05 16:40:45 -08:00
Junio C Hamano	4cc0e8794d	Merge branch 'jk/p5303-sed-portability-fix' A perf script was made more portable. * jk/p5303-sed-portability-fix: p5303: avoid sed GNU-ism	2021-02-05 16:40:45 -08:00
Junio C Hamano	77db59c2f9	Merge branch 'jv/pack-objects-narrower-ref-iteration' The "pack-objects" command needs to iterate over all the tags when automatic tag following is enabled, but it actually iterated over all refs and then discarded everything outside "refs/tags/" hierarchy, which was quite wasteful. * jv/pack-objects-narrower-ref-iteration: builtin/pack-objects.c: avoid iterating all refs	2021-02-05 16:40:45 -08:00
Junio C Hamano	f6ef8baba2	Merge branch 'ph/use-delete-refs' When removing many branches and tags, the code used to do so one ref at a time. There is another API it can use to delete multiple refs, and it makes quite a lot of performance difference when the refs are packed. * ph/use-delete-refs: use delete_refs when deleting tags or branches	2021-02-05 16:40:45 -08:00
Junio C Hamano	6254fa1359	Merge branch 'tb/ls-refs-optim' The ls-refs protocol operation has been optimized to narrow the sub-hierarchy of refs/ it walks to produce response. * tb/ls-refs-optim: ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs.c: initialize 'prefixes' before using it refs: expose 'for_each_fullref_in_prefixes'	2021-02-05 16:40:45 -08:00
Junio C Hamano	5198426d91	Merge branch 'zh/ls-files-deduplicate' "git ls-files" can and does show multiple entries when the index is unmerged, which is a source for confusion unless -s/-u option is in use. A new option --deduplicate has been introduced. * zh/ls-files-deduplicate: ls-files.c: add --deduplicate option ls_files.c: consolidate two for loops into one ls_files.c: bugfix for --deleted and --modified	2021-02-05 16:40:44 -08:00
Junio C Hamano	a0a2d75d3b	Merge branch 'ds/cache-tree-basics' Document, clean-up and optimize the code around the cache-tree extension in the index. * ds/cache-tree-basics: cache-tree: speed up consecutive path comparisons cache-tree: use ce_namelen() instead of strlen() index-format: discuss recursion of cache-tree better index-format: update preamble to cache tree extension index-format: use 'cache tree' over 'cached tree' cache-tree: trace regions for prime_cache_tree cache-tree: trace regions for I/O cache-tree: use trace2 in cache_tree_update() unpack-trees: add trace2 regions tree-walk: report recursion counts	2021-02-05 16:40:44 -08:00
Junio C Hamano	b65b9ff1ff	Merge branch 'en/ort-conflict-handling' ORT merge strategy learns more support for merge conflicts. * en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge	2021-02-05 16:40:44 -08:00
Junio C Hamano	aac006aa99	Merge branch 'so/log-diff-merge' "git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...	2021-02-05 16:40:44 -08:00
Derrick Stolee	eb9071912f	commit-graph: anonymize data in chunk_write_fn In preparation for creating an API around file formats using chunks and tables of contents, prepare the commit-graph write code to use prototypes that will match this new API. Specifically, convert chunk_write_fn to take a "void *data" parameter instead of the commit-graph-specific "struct write_commit_graph_context" pointer. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 15:40:41 -08:00
Jonathan Tan	4f37d45706	clone: respect remote unborn HEAD Teach Git to use the "unborn" feature introduced in a previous patch as follows: Git will always send the "unborn" argument if it is supported by the server. During "git clone", if cloning an empty repository, Git will use the new information to determine the local branch to create. In all other cases, Git will ignore it. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 13:49:55 -08:00
Jonathan Tan	39835409d1	connect, transport: encapsulate arg in struct In a future patch we plan to return the name of an unborn current branch from deep in the callchain to a caller via a new pointer parameter that points at a variable in the caller when the caller calls get_remote_refs() and transport_get_remote_refs(). In preparation for that, encapsulate the existing ref_prefixes parameter into a struct. The aforementioned unborn current branch will go into this new struct in the future patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 13:49:54 -08:00
Jonathan Tan	59e1205d16	ls-refs: report unborn targets of symrefs When cloning, we choose the default branch based on the remote HEAD. But if there is no remote HEAD reported (which could happen if the target of the remote HEAD is unborn), we'll fall back to using our local init.defaultBranch. Traditionally this hasn't been a big deal, because most repos used "master" as the default. But these days it is likely to cause confusion if the server and client implementations choose different values (e.g., if the remote started with "main", we may choose "master" locally, create commits there, and then the user is surprised when they push to "master" and not "main"). To solve this, the remote needs to communicate the target of the HEAD symref, even if it is unborn, and "git clone" needs to use this information. Currently, symrefs that have unborn targets (such as in this case) are not communicated by the protocol. Teach Git to advertise and support the "unborn" feature in "ls-refs" (by default, this is advertised, but server administrators may turn this off through the lsrefs.unborn config). This feature indicates that "ls-refs" supports the "unborn" argument; when it is specified, "ls-refs" will send the HEAD symref with the name of its unborn target. This change is only for protocol v2. A similar change for protocol v0 would require independent protocol design (there being no analogous position to signal support for "unborn") and client-side plumbing of the data required, so the scope of this patch set is limited to protocol v2. The client side will be updated to use this in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 13:49:53 -08:00
Thomas Ackermann	6eda9ac9e5	doc: use https links Use only https links for lore.kernel.org. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:10 -08:00
Thomas Ackermann	1d18997007	doc hash-function-transition: move rationale upwards Move rationale for new hash function to beginning of document so that it appears before the concrete move to SHA-256 is described. Remove some of the details about SHA-1 weaknesses and add references to the details on how the new hash function was chosen instead. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:10 -08:00
Thomas Ackermann	cc9f0916bd	doc hash-function-transition: fix incomplete sentence Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	810372f881	doc hash-function-transition: use upper case consistently Use upper case consistently in Document History. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	af9b1e9aba	doc hash-function-transition: use SHA-1 and SHA-256 consistently Use SHA-1 and SHA-256 instead of sha1 and sha256 when referring to the hash type. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	de82095a95	doc hash-function-transition: fix asciidoc output Asciidoc requires lists to start with an empty line and uses different characters for indentation levels ("-", "", "*", ...). For special symbols like a dash "--" has to be used and there is no double arrow "<->", so a left and right arrow "<-->" has to be combined for that. Lastly for verbatim output a newline followed by an indentation has to be used. Fix asciidoc output for lists, special characters and verbatim text while retaining the readabilty of the original text file. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Johannes Schindelin	5189bb8724	range-diff: simplify code spawning `git log` Previously, we waited for the child process to be finished in every failing code path as well as at the end of the function `show_range_diff()`. However, we do not need to wait that long. Directly after reading the output of the child process, we can wrap up the child process. This also has the advantage that we don't do a bunch of unnecessary work in case `finish_command()` returns with an error anyway. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-04 17:16:42 -08:00
Johannes Schindelin	a2d474adf3	range-diff: libify the read_patches() function again In library functions, we do want to avoid the (simple, but rather final) `die()` calls, instead returning with a value indicating an error. Let's do exactly that in the code introduced in `b66885a30c` (range-diff: add section header instead of diff header, 2019-07-11) that wants to error out if a diff header could not be parsed. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-04 17:16:42 -08:00
Johannes Schindelin	8c29b49794	range-diff: avoid leaking memory in two error code paths In the code paths in question, we already release a lot of memory, but the `current_filename` variable was missed. Fix that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-04 17:16:42 -08:00
Junio C Hamano	30b29f044a	The fifth batch	2021-02-03 15:04:49 -08:00
Junio C Hamano	22f2bce651	Merge branch 'jk/run-command-use-shell-doc' The .use_shell flag in struct child_process that is passed to run_command() API has been clarified with a bit more documentation. * jk/run-command-use-shell-doc: run-command: document use_shell option	2021-02-03 15:04:49 -08:00
Junio C Hamano	973e20b83f	Merge branch 'jk/peel-iterated-oid' The peel_ref() API has been replaced with peel_iterated_oid(). * jk/peel-iterated-oid: refs: switch peel_ref() to peel_iterated_oid()	2021-02-03 15:04:49 -08:00
Junio C Hamano	6cd7f9dc29	Merge branch 'js/skip-dashed-built-ins-from-config-mak' Build fix. * js/skip-dashed-built-ins-from-config-mak: SKIP_DASHED_BUILT_INS: respect `config.mak`	2021-02-03 15:04:49 -08:00
Junio C Hamano	d03553ecd1	Merge branch 'jt/packfile-as-uri-doc' Doc fix for packfile URI feature. * jt/packfile-as-uri-doc: Doc: clarify contents of packfile sent as URI	2021-02-03 15:04:49 -08:00
Junio C Hamano	15bf48b987	Merge branch 'ds/maintenance-prefetch-cleanup' Test clean-up plus UI improvement by hiding extra refs that the prefetch task uses from "log --decorate" output. * ds/maintenance-prefetch-cleanup: t7900: clean up some broken refs maintenance: set log.excludeDecoration durin prefetch	2021-02-03 15:04:48 -08:00
Junio C Hamano	18e3f5a944	Merge branch 'ab/fsck-doc-fix' Documentation for "git fsck" lost stale bits that has become incorrect. * ab/fsck-doc-fix: fsck doc: remove ancient out-of-date diagnostics	2021-02-03 15:04:48 -08:00
Pranit Bauva	97b8294474	bisect--helper: retire `--check-and-set-terms` subcommand The `--check-and-set-terms` subcommand is no longer from the git-bisect.sh shell script. Instead the function `check_and_set_terms()` is called from the C implementation. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:09 -08:00
Pranit Bauva	e4c7b33747	bisect--helper: reimplement `bisect_skip` shell function in C Reimplement the `bisect_skip()` shell function in C and also add `bisect-skip` subcommand to `git bisect--helper` to call it from git-bisect.sh Using `--bisect-skip` subcommand is a temporary measure to port shell function to C so as to use the existing test suite. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:09 -08:00
Pranit Bauva	9feea34810	bisect--helper: retire `--bisect-auto-next` subcommand The --bisect-auto-next subcommand is no longer used from the git-bisect.sh shell script. Instead the function bisect_auto_next() is directly called from the C implementation. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:09 -08:00
Pranit Bauva	b7a6f163d6	bisect--helper: use `res` instead of return in BISECT_RESET case option Use `res` variable to store `bisect_reset()` output in BISECT_RESET case option to make bisect--helper.c more consistent. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:09 -08:00
Pranit Bauva	68efed8c8a	bisect--helper: retire `--bisect-write` subcommand The `--bisect-write` subcommand is no longer used from the git-bisect.sh shell script. Instead the function `bisect_write()` is directly called from the C implementation. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:08 -08:00
Pranit Bauva	2b1fd947f6	bisect--helper: reimplement `bisect_replay` shell function in C Reimplement the `bisect_replay` shell function in C and also add `--bisect-replay` subcommand to `git bisect--helper` to call it from git-bisect.sh Using `--bisect-replay` subcommand is a temporary measure to port shell function to C so as to use the existing test suite. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:08 -08:00
Pranit Bauva	97d5ba6a39	bisect--helper: reimplement `bisect_log` shell function in C Reimplement the `bisect_log()` shell function in C and also add `--bisect-log` subcommand to `git bisect--helper` to call it from git-bisect.sh . Using `--bisect-log` subcommand is a temporary measure to port shell function to C so as to use the existing test suite. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:52:08 -08:00
Jeff King	27dc071b9a	doc/git-branch: fix awkward wording for "-c" The description for "-c" is hard to parse. I think the big issue is lack of commas, but I've also reordered the words to keep the main focus point of "instead of renaming, copy" together. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:14:31 -08:00
Jeff King	bca362c1f9	completion: handle other variants of "branch -m" We didn't special-case "branch -M" (with a capital M) the same as "branch -m", nor any of the "--copy" variants. As a result these offered any ref as the next candidate, and not just branch names. Note that I rewrapped case-arm line since it's now quite long, and likewise the one below it for consistency. I also re-ordered the existing "-D" to make it more obvious how the cases group together. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:14:24 -08:00
Torsten Bögershausen	5c327502db	MacOS: precompose_argv_prefix() The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y \| git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, `76759c7dff` "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit `90a78b83e0` "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit `8e712ef6fc` "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, `76759c7dff`, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-03 14:09:37 -08:00
Jeff King	a534cf4f4d	completion: treat "branch -D" the same way as "branch -d" The former offers not just branches but tags as completion candidates. Mimic how "branch -d" limits its suggestion to branch names. Reported-by: Paul Jolly <paul@myitcv.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-02 13:26:10 -08:00
Jacob Vosmaer	ad6b5fefbd	t5544: clarify 'hook works with partial clone' test Apply a few leftover improvements from the review of `ad5df6b782` (upload-pack.c: fix filter spec quoting bug). 1. Instead of enumerating objects reachable from HEAD, enumerate all reachable objects, because HEAD has not special significance in this test. 2. Instead of relying on the knowledge that "? in rev-list output means partial clone", explicitly verify that there are no blobs with cat-file. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-02 12:21:38 -08:00
Pratyush Yadav	7da7ef6d7a	Merge branch 'mk/russian-translation' Fix typo in Russian translation. * mk/russian-translation: git-gui: fix typo in russian locale	2021-02-02 23:51:30 +05:30
Mikhail Klyushin	413e96f41e	git-gui: fix typo in russian locale Fixed typo in russian locale: издекса -> индекса Signed-off-by: Mikhail Klyushin <klyushinmisha@gmail.com> Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>	2021-02-02 23:50:31 +05:30
Ævar Arnfjörð Bjarmason	be8fc53e36	pager: properly log pager exit code when signalled When git invokes a pager that exits with non-zero the common case is that we'll already return the correct SIGPIPE failure from git itself, but the exit code logged in trace2 has always been incorrectly reported[1]. Fix that and log the correct exit code in the logs. Since this gives us something to test outside of our recently-added tests needing a !MINGW prerequisite, let's refactor the test to run on MINGW and actually check for SIGPIPE outside of MINGW. The wait_or_whine() is only called with a true "in_signal" from from finish_command_in_signal(), which in turn is only used in pager.c. The "in_signal && !WIFEXITED(status)" case is not covered by tests. Let's log the default -1 in that case for good measure. 1. The incorrect logging of the exit code in was seemingly copy/pasted into finish_command_in_signal() in `ee4512ed48` (trace2: create new combined trace facility, 2019-02-22) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:15:58 -08:00
Ævar Arnfjörð Bjarmason	85db79a96e	run-command: add braces for "if" block in wait_or_whine() Add braces to an "if" block in the wait_or_whine() function. This isn't needed now, but will make a subsequent commit easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:15:58 -08:00
Ævar Arnfjörð Bjarmason	c24b7f6736	pager: test for exit code with and without SIGPIPE Add tests for how git behaves when the pager itself exits with non-zero, as well as for us exiting with 141 when we're killed with SIGPIPE due to the pager not consuming its output. There is some recent discussion[1] about these semantics, but aside from what we want to do in the future, we should have a test for the current behavior. This test construct is stolen from `7559a1be8a` (unblock and unignore SIGPIPE, 2014-09-18). The reason not to make the test itself depend on the MINGW prerequisite is to make a subsequent commit easier to read. 1. https://lore.kernel.org/git/87o8h4omqa.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:15:58 -08:00
Ævar Arnfjörð Bjarmason	61ff12fa50	pager: refactor wait_for_pager() function Refactor the wait_for_pager() function. Since `507d7804c0` (pager: don't use unsafe functions in signal handlers, 2015-09-04) the wait_for_pager() and wait_for_pager_atexit() callers diverged on more than they shared. Let's extract the common code into a new close_pager_fds() helper, and move the parts unique to the only to callers to those functions. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:15:58 -08:00
Derrick Stolee	bc50d6c91f	commit-graph: prepare commit graph Before checking if the repository has a commit-graph loaded, be sure to run prepare_commit_graph(). This is necessary because otherwise the topo_levels slab is not initialized. As we compute topo_levels for the new commits, we iterate further into the lower layers since the first visit to each commit looks as though the topo_level is not populated. By properly initializing the topo_slab, we fix the previously broken case of a split commit graph where a base layer has the generation_data_overflow chunk. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:36 -08:00
Derrick Stolee	fde55b0906	commit-graph: be extra careful about mixed generations When upgrading to a commit-graph with corrected commit dates from one without, there are a few things that need to be considered. When computing generation numbers for the new commit-graph file that expects to add the generation_data chunk with corrected commit dates, we need to ensure that the 'generation' member of the commit_graph_data struct is set to zero for these commits. Unfortunately, the fallback to use topological level for generation number when corrected commit dates are not available are causing us harm here: parsing commits notices that read_generation_data is false and populates 'generation' with the topological level. The solution is to iterate through the commits, parse the commits to populate initial values, then reset the generation values to zero to trigger recalculation. This loop only occurs when the existing commit-graph data has no corrected commit dates. While this improves our situation somewhat, we have not completely solved the issue for correctly computing generation numbers for mixed layers. That follows in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:36 -08:00
Derrick Stolee	9c2c0a8256	commit-graph: compute generations separately The compute_generation_numbers() method was introduced by `3258c663` (commit-graph: compute generation numbers, 2018-05-01) to compute what is now known as "topological levels". These are still stored in the commit-graph file for compatibility sake while `c1a09119` (commit-graph: implement corrected commit date, 2021-01-16) updated the method to also compute the new version of generation numbers: corrected commit date. It makes sense why these are grouped. They perform very similar walks of the necessary commits and compute similar maximums over each parent. However, having these two together conflates them in subtle ways that is hard to separate. In particular, the topo_level slab is used to store the topological levels in all cases, but the commit_graph_data_at(c)->generation member stores different values depending on the state of the existing commit-graph file. * If the existing commit-graph file has a "GDAT" chunk, then these values represent corrected commit dates. * If the existing commit-graph file doesn't have a "GDAT" chunk, then these values are actually the topological levels. This issue only occurs only when upgrading an existing commit-graph file into one that has the "GDAT" chunk. The current change does not resolve this upgrade problem, but splitting the implementation into two pieces here helps with that process, which will follow in the next change. The important thing this helps with is the case where the num_generation_data_overflows was being incremented incorrectly, triggering a write of the overflow chunk. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:36 -08:00
Derrick Stolee	448a39e65d	commit-graph: validate layers for generation data We need to be extra careful that we don't use corrected commit dates from any layer of a commit-graph chain if there is a single commit-graph file that is missing the generation_data chunk. Update validate_mixed_generation_chain() to correctly update each layer to ignore the generation_data chunk in this case. It now also returns 1 if all layers have a generation_data chunk. This return value will be used in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:36 -08:00
Derrick Stolee	90cb1c47c7	commit-graph: always parse before commit_graph_data_at() There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:36 -08:00
Derrick Stolee	c4cc083169	commit-graph: use repo_parse_commit The write_commit_graph_context has a repository pointer, so use it. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 21:03:35 -08:00
Derrick Stolee	0fac156523	commit-reach: reduce requirements for remove_redundant() Remove a comment at the beggining of remove_redundant() that mentions a reordering of the input array to have the initial segment be the independent commits and the final segment be the redundant commits. While this behavior is followed in remove_redundant(), no callers rely on that behavior. Remove the final loop that copies this final segment and update the comment to match the new behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-01 11:50:33 -08:00
Rafael Silva	076b444a62	worktree: teach `list` verbose mode "git worktree list" annotates each worktree according to its state such as "prunable" or "locked", however it is not immediately obvious why these worktrees are being annotated. For prunable worktrees a reason is available that is returned by should_prune_worktree() and for locked worktrees a reason might be available provided by the user via `lock` command. Let's teach "git worktree list" a --verbose mode that outputs the reason why the worktrees are being annotated. The reason is a text that can take virtually any size and appending the text on the default columned format will make it difficult to extend the command with other annotations and not fit nicely on the screen. In order to address this shortcoming the annotation is then moved to the next line indented followed by the reason If the reason is not available the annotation stays on the same line as the worktree itself. The output of "git worktree list" with verbose becomes like so: $ git worktree list --verbose ... /path/to/locked-no-reason acb124 [branch-a] locked /path/to/locked-with-reason acc125 [branch-b] locked: worktree with a locked reason /path/to/prunable-reason ace127 [branch-d] prunable: gitdir file points to non-existent location ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:40 -08:00
Rafael Silva	9b19a58f66	worktree: teach `list` to annotate prunable worktree The "git worktree list" command shows the absolute path to the worktree, the commit that is checked out, the name of the branch, and a "locked" annotation if the worktree is locked, however, it does not indicate whether the worktree is prunable. The "prune" command will remove a worktree if it is prunable unless `--dry-run` option is specified. This could lead to a worktree being removed without the user realizing before it is too late, in case the user forgets to pass --dry-run for instance. If the "list" command shows which worktree is prunable, the user could verify before running "git worktree prune" and hopefully prevents the working tree to be removed "accidentally" on the worse case scenario. Let's teach "git worktree list" to show when a worktree is a prunable candidate for both default and porcelain format. In the default format a "prunable" text is appended: $ git worktree list /path/to/main aba123 [main] /path/to/linked 123abc [branch-a] /path/to/prunable ace127 (detached HEAD) prunable In the --porcelain format a prunable label is added followed by its reason: $ git worktree list --porcelain ... worktree /path/to/prunable HEAD abc1234abc1234abc1234abc1234abc1234abc12 detached prunable gitdir file points to non-existent location ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:35 -08:00
Rafael Silva	862c723d18	worktree: teach `list --porcelain` to annotate locked worktree Commit `c57b3367be` (worktree: teach `list` to annotate locked worktree, 2020-10-11) taught "git worktree list" to annotate locked worktrees by appending "locked" text to its output, however, this is not listed in the --porcelain format. Teach "list --porcelain" to do the same and add a "locked" attribute followed by its reason, thus making both default and porcelain format consistent. If the locked reason is not available then only "locked" is shown. The output of the "git worktree list --porcelain" becomes like so: $ git worktree list --porcelain ... worktree /path/to/locked HEAD 123abcdea123abcd123acbd123acbda123abcd12 detached locked worktree /path/to/locked-with-reason HEAD abc123abc123abc123abc123abc123abc123abc1 detached locked reason why it is locked ... In porcelain mode, if the lock reason contains special characters such as newlines, they are escaped with backslashes and the entire reason is enclosed in double quotes. For example: $ git worktree list --porcelain ... locked "worktree's path mounted in\nremovable device" ... Furthermore, let's update the documentation to state that some attributes in the porcelain format might be listed alone or together with its value depending whether the value is available or not. Thus documenting the case of the new "locked" attribute. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:29 -08:00
Rafael Silva	47409e75f5	t2402: ensure locked worktree is properly cleaned up `c57b3367be` (worktree: teach `list` to annotate locked worktree, 2020-10-11) introduced a new test to ensure locked worktrees are listed with "locked" annotation. However, the test does not clean up after itself as "git worktree prune" is not going to remove the locked worktree in the first place. This not only leaves the test in an unclean state it also potentially breaks following tests that rely on the "git worktree list" output. Let's fix that by unlocking the worktree before the "prune" command. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:24 -08:00
Rafael Silva	eb36135af7	worktree: teach worktree_lock_reason() to gently handle main worktree worktree_lock_reason() aborts with an assertion failure when called on the main worktree since locking the main worktree is nonsensical. Not only is this behavior undocumented, thus callers might not even be aware that the call could potentially crash the program, but it also forces clients to be extra careful: if (!is_main_worktree(wt) && worktree_locked_reason(...)) ... Since we know that locking makes no sense in the context of the main worktree, we can simply return false for the main worktree, thus making client code less complex by eliminating the need for the callers to have inside knowledge about the implementation: if (worktree_lock_reason(...)) ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:20 -08:00
Rafael Silva	fc0c7d5e9e	worktree: teach worktree to lazy-load "prunable" reason Add worktree_prune_reason() to allow a caller to discover whether a worktree is prunable and the reason that it is, much like worktree_lock_reason() indicates whether a worktree is locked and the reason for the lock. As with worktree_lock_reason(), retrieve the prunable reason lazily and cache it in the `worktree` structure. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:16 -08:00
Rafael Silva	a29a8b7574	worktree: libify should_prune_worktree() As part of teaching "git worktree list" to annotate worktree that is a candidate for pruning, let's move should_prune_worktree() from builtin/worktree.c to worktree.c in order to make part of the worktree public API. should_prune_worktree() knows how to select the given worktree for pruning based on an expiration date, however the expiration value is stored in a static file-scope variable and it is not local to the function. In order to move the function, teach should_prune_worktree() to take the expiration date as an argument and document the new parameter that is not immediately obvious. Also, change the function comment to clearly state that the worktree's path is returned in `wtpath` argument. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-30 09:57:08 -08:00
Dimitriy Ryazantcev	a093f0ba95	l10n: ru.po: update Russian translation Kudos to Philipp Bartsch for whitespace fixes and his helper script[1]. [1]: https://git.grmr.de/phil/pocheck Signed-off-by: Dimitriy Ryazantcev <dimitriy.ryazantcev@gmail.com>	2021-01-29 21:45:17 +02:00
Taylor Blau	6885cd7dc5	t5325: check both on-disk and in-memory reverse index Right now, the test suite can be run with 'GIT_TEST_WRITE_REV_INDEX=1' in the environment, which causes all operations which write a pack to also write a .rev file. To prepare for when that eventually becomes the default, we should continue to test the in-memory reverse index, too, in order to avoid losing existing coverage. Unfortunately, explicit existing coverage is rather sparse, so only a basic test is added that compares the result of git rev-list --objects --no-object-names --all \| git cat-file --batch-check='%(objectsize:disk) %(objectname)' with and without an on-disk reverse index. Suggested-by: Jeff King <peff@peff.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 22:51:51 -08:00
Jeff King	018b9deba5	pretty: lazy-load commit data when expanding user-format When we expand a user-format, we try to avoid work that isn't necessary for the output. For instance, we don't bother parsing the commit header until we know we need the author, subject, etc. But we do always load the commit object's contents from disk, even if the format doesn't require it (e.g., just "%H"). Traditionally this didn't matter much, because we'd have loaded it as part of the traversal anyway, and we'd typically have those bytes attached to the commit struct (or these days, cached in a commit-slab). But when we have a commit-graph, we might easily get to the point of pretty-printing a commit without ever having looked at the actual object contents. We should push off that load (and reencoding) until we're certain that it's needed. I think the results of p4205 show the advantage pretty clearly (we serve parent and tree oids out of the commit struct itself, so they benefit as well): # using git.git as the test repo Test HEAD^ HEAD ---------------------------------------------------------------------- 4205.1: log with %H 0.40(0.39+0.01) 0.03(0.02+0.01) -92.5% 4205.2: log with %h 0.45(0.44+0.01) 0.09(0.09+0.00) -80.0% 4205.3: log with %T 0.40(0.39+0.00) 0.04(0.04+0.00) -90.0% 4205.4: log with %t 0.46(0.46+0.00) 0.09(0.08+0.01) -80.4% 4205.5: log with %P 0.39(0.39+0.00) 0.03(0.03+0.00) -92.3% 4205.6: log with %p 0.46(0.46+0.00) 0.10(0.09+0.00) -78.3% 4205.7: log with %h-%h-%h 0.52(0.51+0.01) 0.15(0.14+0.00) -71.2% 4205.8: log with %an-%ae-%s 0.42(0.41+0.00) 0.42(0.41+0.01) +0.0% # using linux.git as the test repo Test HEAD^ HEAD ---------------------------------------------------------------------- 4205.1: log with %H 7.12(6.97+0.14) 0.76(0.65+0.11) -89.3% 4205.2: log with %h 7.35(7.19+0.16) 1.30(1.19+0.11) -82.3% 4205.3: log with %T 7.58(7.42+0.15) 1.02(0.94+0.08) -86.5% 4205.4: log with %t 8.05(7.89+0.15) 1.55(1.41+0.13) -80.7% 4205.5: log with %P 7.12(7.01+0.10) 0.76(0.69+0.07) -89.3% 4205.6: log with %p 7.38(7.27+0.10) 1.32(1.20+0.12) -82.1% 4205.7: log with %h-%h-%h 7.81(7.67+0.13) 1.79(1.67+0.12) -77.1% 4205.8: log with %an-%ae-%s 7.90(7.74+0.15) 7.81(7.66+0.15) -1.1% I added the final test to show where we don't improve (the 1% there is just lucky noise), but also as a regression test to make sure we're not doing anything stupid like loading the commit multiple times when there are several placeholders that need it. Reported-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 14:07:35 -08:00
Johannes Schindelin	f7d42ceec5	rebase -i: do leave commit message intact in fixup! chains In `6e98de72c0` (sequencer (rebase -i): add support for the 'fixup' and 'squash' commands, 2017-01-02), this developer introduced a change of behavior by mistake: when encountering a `fixup!` commit (or multiple `fixup!` commits) without any `squash!` commit thrown in, the final `git commit` was invoked with `--cleanup=strip`. Prior to that commit, the commit command had been called without that `--cleanup` option. Since we explicitly read the original commit message from a file in that case, there is really no sense in forcing that clean-up. We actually need to actively suppress that clean-up lest a configured `commit.cleanup` may interfere with what we want to do: leave the commit message unchanged. Reported-by: Vojtěch Knyttl <vojtech@knyt.tl> Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:12:37 -08:00
Jeff King	30291525d9	t0000: consistently use single quotes for outer tests When we use the sub-test helpers, we end up defining one shell snippet inside another shell snippet. So if we use single-quotes for the outer snippet, we have to use double-quotes within the inner snippet (it's included as here-doc within the outer snippet, but using a single quote would end the outer snippet early). Or vice versa we can use double quotes for the outer snippet, but then single quotes in the inner. We have some of each in the script, and neither is wrong. But it would be nice to be consistent unless there is a good reason not to. Using single quotes for the outer script is preferable, because it requires less metacharacter quoting overall. For example, in: test_expect_success 'outer' ' run_sub_test_lib_test ... <<-\EOF echo $foo && test_expect_success "inner" " echo \$bar " EOF ' we need only quote inside "inner", but not inside "outer" or the here-doc. Whereas if we flip them, we have to quote in both places: test_expect_success 'outer' " run_sub_test_lib_test ... <<-\EOF echo \$foo && test_expect_success 'inner' ' echo \$bar ' EOF " The exception is when we need a literal single-quote in an expected output here-doc. There we can either use outer double-quotes, or just use ${SQ} within the doc. I chose the latter for consistency (within this test, but also with other test scripts that face the same problem). There is one other interesting case, which is some tests that do: test_expect_success ... " do_something --run='"'!3'"' " This is rather confusing to read, but is correct. The outer script sees '!3' in single-quotes, as does the eval'd snippet. This is perhaps being overly cautious. In many interactive shells, an exclamation triggers history expansion even inside double quotes, but that is not generally true in non-interactive shells. There's some conflicting information here. Commit `784ce03d55` (t4216: avoid unnecessary subshell in test_bloom_filters_not_used, 2020-05-19) reports it as a problem with OpenBSD 6.7's /bin/sh. However, we have many instances in this script of prereqs like !LAZY_TRUE, which haven't been a problem. I left them un-escaped here to test out this theory. It's much nicer if we can not worry about this as a portability issue, so it's worth knowing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:06:26 -08:00
Jeff King	080e295248	t0000: run cleaning test inside sub-test Our check of test_when_finished is done directly in the main script, and if we failed to clean, we complain and exit immediately. It's nicer to signal a test failure here, for a few reasons: - this gives better output to the user when run under a TAP harness like "prove" - constency; it's the only test left in the file that behaves this way - half of its "if" conditional is nonsense anyway; it picked up a reference to GIT_TEST_FAIL_PREREQS_INTERNAL in `dfe1a17df9` (tests: add a special setup where prerequisites fail, 2019-05-13) along with its neighbors, even though it has nothing to do with that flag We could actually do this without a sub-test at all, and just put our two tests (one to do cleanup, and one to check that it happened) in the main script. But doing it in a subtest is conceptually cleaner (from the perspective of the main test script, we are checking only one thing), and it remains consistent with the "cleanup when failing" test directly after it, which has to happen in a sub-test (to avoid the main script complaining of the failed test). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:06:26 -08:00
Jeff King	efd2600e6f	t0000: run prereq tests inside sub-test We test the behavior of prerequisites in t0000 by setting up fake ones in the main test script, trying to run some tests, and then seeing if those tests impacted the environment correctly. If they didn't, then we write a message and manually call exit. Instead, let's push these down into a sub-test, like many of the other tests covering the framework itself. This has a few advantages: - it does not pollute the test output with mention of skipped tests (that we know are uninteresting -- the point of the test was to see that these are skipped). - when running in a TAP harness, we get a useful test failure message (whereas when the script exits early, a tool like "prove" simply says "Dubious, test returned 1"). - we do not have to worry about different test environments, such as when GIT_TEST_FAIL_PREREQS_INTERNAL is set. Our sub-test helpers already give us a known environment. - the tests themselves are a bit easier to read, as we can just check the test-framework output to see what happened (and get the usual test_cmp diff if it failed) A few notes on the implementation: - we could do one sub-test per each individual test_expect_success. I broke it up here into a few logical groups, as I think this makes it more readable - the original tests modified environment variables inside the test bodies. Instead, I've used "true" as the body of a test we expect to run and "false" otherwise. Technically this does not confirm that the body of the "true" test actually ran. We are trusting the framework output to believe that it truly ran, which is sufficient for these tests. And I think the end result is much simpler to follow. - the nested_prereq test uses a few bare "test -f" calls; I converted these to our usual test_path_is_* helpers while moving the code around. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:06:26 -08:00
Jeff King	03efadb774	t0000: keep clean-up tests together We check that test_when_finished cleans up after a test, and that it runs even after a failure. Those two were originally adjacent, but got split apart by the new test added in `477dcaddb6` (tests: do not let lazy prereqs inside `test_expect_*` turn off tracing, 2020-03-26), and then further by more lazy-prereq tests. Let's move them back together. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:06:25 -08:00
Jeff King	8380dcd700	oid_pos(): access table through const pointers When we are looking up an oid in an array, we obviously don't need to write to the array. Let's mark it as const in the function interfaces, as well as in the local variables we use to derference the void pointer (note a few cases use pointers-to-pointers, so we mark everything const). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:03:26 -08:00
Jeff King	45ee13b942	hash_pos(): convert to oid_pos() All of our callers are actually looking up an object_id, not a bare hash. Likewise, the arrays they are looking in are actual arrays of object_id (not just raw bytes of hashes, as we might find in a pack .idx; those are handled by bsearch_hash()). Using an object_id gives us more type safety, and makes the callers slightly shorter. It also gets rid of the word "sha1" from several access functions, though we could obviously also rename those with s/sha1/hash/. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 12:02:39 -08:00
Jeff King	680ff910b0	rerere: use strmap to store rerere directories We store a struct for each directory we access under .git/rr-cache. The structs are kept in an array sorted by the binary hash associated with their name (and we do lookups with a binary search). This works OK, but there are a few small downsides: - the amount of code isn't huge, but it's more than we'd need using one of our other stock data structures - the insertion into a sorted array is quadratic (though in practice it's unlikely anybody has enough conflicts for this to matter) - it's intimately tied to the representation of an object hash. This isn't a big deal, as the conflict ids we generate use the same hash, but it produces a few awkward bits (e.g., we are the only user of hash_pos() that is not using object_id). Let's instead just treat the directory names as strings, and store them in a strmap. This is less code, and removes the use of hash_pos(). Insertion is now non-quadratic, though we probably use a bit more memory. Besides the hash table overhead, and storing hex bytes instead of a binary hash, we actually store each name twice. Other code expects to access the name of a rerere_dir struct from the struct itself, so we need a copy there. But strmap keeps its own copy of the name, as well. Using a bare hashmap instead of strmap means we could use the name for both, but at the cost of extra code (e.g., our own comparison function). Likewise, strmap has a feature to use a pointer to the in-struct name at the cost of a little extra code. I didn't do either here, as simple code seemed more important than squeezing out a few bytes of efficiency. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 11:26:20 -08:00
Jeff King	098c173f2b	rerere: tighten rr-cache dirname check We check only that get_sha1_hex() doesn't complain, which means we'd match an all-hex name with trailing cruft after it. This probably doesn't matter much in practice, since there shouldn't be anything else in the rr-cache directory, but it could possibly cause us to mix up sha1 and sha256 entries (which also shouldn't be intermingled, but could be leftovers from a repository conversion). Note that "get_sha1_hex()" is a confusing historical name. It is actually using the_hash_algo, so it would be sha256 in a sha256 repo. We'll switch to using parse_oid_hex(), because that conveniently advances our pointer. But it also gets rid of the sha1 name. Arguably it's a little funny to use "object_id" here for something that isn't actually naming an object, but it's unlikely to be a problem (and is contained in a single function). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 11:25:43 -08:00
Jeff King	2bc1a87e42	rerere: check dirname format while iterating rr_cache directory In rerere_gc(), we walk over the .git/rr_cache directory and create a struct for each entry we find. We feed any name we get from readdir() to find_rerere_dir(), which then calls get_sha1_hex() on it (since we use the binary hash as a lookup key). If that fails (i.e., the directory name is not what we expected), it returns NULL. But the comment in find_rerere_dir() says "BUG". It _would_ be a bug for the call from new_rerere_id_hex(), the only other code path, to fail here; it's generating the hex internally. But the call in rerere_gc() is using it say "is this a plausible directory name". Let's instead have rerere_gc() do its own "is this plausible" check. That has two benefits: - we can now reliably BUG() inside find_rerere_dir(), which would catch bugs in the other code path (and we now will never return NULL from the function, which makes it easier to see that a rerere_id struct will always have a non-NULL "collection" field). - it makes the use of the binary hash an implementation detail of find_rerere_dir(), not known by callers. That will free us up to change it in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 11:21:27 -08:00
Jeff King	98c431b6f9	commit_graft_pos(): take an oid instead of a bare hash All of our callers have an object_id, and are just dereferencing the hash field to pass to us. Let's take the actual object_id instead. We still access the hash to pass to hash_pos, but it's a step in the right direction. This makes the callers slightly simpler, but also gets rid of the untyped pointer, as well as the now-inaccurate name "sha1". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 11:21:07 -08:00
Jacob Vosmaer	ad5df6b782	upload-pack.c: fix filter spec quoting bug Fix a bug in upload-pack.c that occurs when you combine partial clone and uploadpack.packObjectsHook. You can reproduce it as follows: git clone -u 'git -c uploadpack.allowfilter '\ '-c uploadpack.packobjectshook=env '\ 'upload-pack' --filter=blob:none --no-local \ src.git dst.git Be careful with the line endings because this has a long quoted string as the -u argument. The error I get when I run this is: Cloning into '/tmp/broken'... remote: fatal: invalid filter-spec ''blob:none'' error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: index-pack failed The problem is caused by unneeded quoting. This bug was already present in `10ac85c785` (upload-pack: add object filtering for partial clone, 2017-12-08) when the server side filter support was introduced. In fact, in `10ac85c785` this was broken regardless of uploadpack.packObjectsHook. Then in `0b6069fe0a` (fetch-pack: test support excluding large blobs, 2017-12-08) the quoting was removed but only behind a conditional that depends on whether uploadpack.packObjectsHook is set. Because uploadpack.packObjectsHook is apparently rarely used, nobody noticed the problematic quoting could still happen. Remove the conditional quoting and add a test for partial clone in t5544-pack-objects-hook. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-28 09:40:24 -08:00
Jeff King	765dc16888	git-compat-util: always enable variadic macros We allow variadic macros in the code base, but only if there is fallback code for platforms that lack it. This leads to some annoyances: - the code is more complicated because of the fallbacks (e.g., trace_printf(), etc, is implemented twice with a set of parallel wrappers). - some constructs are just impossible and we've had to live without them (e.g., a cross between FLEX_ALLOC and xstrfmt) Since this feature is present in C99, we may be able to start counting on it being available everywhere. Let's start with a weather balloon patch to find out. This patch makes the absolute minimal change by always setting HAVE_VARIADIC_MACROS. If somebody runs into a platform where it's a problem, they can undo it by commenting out the define. Likewise, if we have to revert this, it would be quite unlikely to cause conflicts. Once we feel comfortable that this is the right direction, then we can start ripping out all the spots that actually look at the flag, and removing the dead code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-27 22:14:37 -08:00
Johannes Schindelin	679b5916cd	range-diff/format-patch: refactor check for commit range Currently, when called with exactly two arguments, `git range-diff` tests for a literal `..` in each of the two. Likewise, the argument provided via `--range-diff` to `git format-patch` is checked in the same manner. However, `<commit>^!` is a perfectly valid commit range, equivalent to `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of gitrevisions[7]. In preparation for allowing more sophisticated ways to specify commit ranges, let's refactor the check into its own function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-27 22:01:49 -08:00
SZEDER Gábor	134768cf53	test-lib: prevent '--stress-jobs=X' from being ignored './t1234-foo.sh --stress-jobs=X ...' is supposed to run that test script in X parallel jobs, but the number of jobs specified on the command line is entirely ignored if other '--stress'-related options follow. I.e. both './t1234-foo.sh --stress-jobs=X --stress-limit=Y' and './t1234-foo.sh --stress-jobs=X --stress' fall back to using twice the number of CPUs parallel jobs instead. The former has been broken since commit `de69e6f6c9` (tests: let --stress-limit=<N> imply --stress, 2019-03-03) [1], which started to unconditionally overwrite the $stress variable holding the specified number of jobs in its effort to imply '--stress'. The latter has been broken since `f545737144` (tests: introduce --stress-jobs=<N>, 2019-03-03), because it didn't consider that handling '--stress' will overwrite that variable as well. We could fix this by being more careful about (over)writing that $stress variable and checking first whether it has already been set. But I think it's cleaner to use a dedicated variable to hold the number of specified parallel jobs, so let's do that instead. [1] In `de69e6f6c9` there was no '--stress-jobs=X' option yet, the number of parallel jobs had to be specified via '--stress=X', so, strictly speaking, `de69e6f6c9` broke './t1234-foo.sh --stress=X --stress-limit=Y'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 17:58:33 -08:00
Ævar Arnfjörð Bjarmason	15c9649730	grep/log: remove hidden --debug and --grep-debug options Remove the hidden "grep --debug" and "log --grep-debug" options added in `17bf35a3c7` (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in `e01b4dab01` (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is `c581e4a749` (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier `b65abcafc7` (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug $ -e foo --and -e bar $ --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 11:36:20 -08:00
Taylor Blau	ec8e7760ac	pack-revindex: ensure that on-disk reverse indexes are given precedence When an on-disk reverse index exists, there is no need to generate one in memory. In fact, doing so can be slow, and require large amounts of the heap. Let's make sure that we treat the on-disk reverse index with precedence (i.e., that when it exists, we don't bother trying to generate an equivalent one in memory) by teaching Git how to conditionally die() when generating a reverse index in memory. Then, add a test to ensure that when (a) an on-disk reverse index exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we do not die, implying that we read from the on-disk one. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	e8c58f894b	t: support GIT_TEST_WRITE_REV_INDEX Add a new option that unconditionally enables the pack.writeReverseIndex setting in order to run the whole test suite in a mode that generates on-disk reverse indexes. Additionally, enable this mode in the second run of tests under linux-gcc in 'ci/run-build-and-tests.sh'. Once on-disk reverse indexes are proven out over several releases, we can change the default value of that configuration to 'true', and drop this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	35a8a3547a	t: prepare for GIT_TEST_WRITE_REV_INDEX In the next patch, we'll add support for unconditionally enabling the 'pack.writeReverseIndex' setting with a new GIT_TEST_WRITE_REV_INDEX environment variable. This causes a little bit of fallout with tests that, for example, compare the list of files in the pack directory being unprepared to see .rev files in its output. Those locations can be cleaned up to look for specific file extensions, rather than take everything in the pack directory (for instance) and then grep out unwanted items. Once the pack.writeReverseIndex option has been thoroughly tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX, and making it possible to revert this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	1615c567b8	Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Now that the pack.writeReverseIndex configuration is respected in both 'git index-pack' and 'git pack-objects' (and therefore, all of their callers), we can safely advertise it for use in the git-config manual. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	c97733435a	builtin/pack-objects.c: respect 'pack.writeReverseIndex' Now that we have an implementation that can write the new reverse index format, enable writing a .rev file in 'git pack-objects' by consulting the pack.writeReverseIndex configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Taylor Blau	e37d0b8730	builtin/index-pack.c: write reverse indexes Teach 'git index-pack' to optionally write and verify reverse index with '--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex' configuration option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Taylor Blau	84d544943c	builtin/index-pack.c: allow stripping arbitrary extensions To derive the filename for a .idx file, 'git index-pack' uses derive_filename() to strip the '.pack' suffix and add the new suffix. Prepare for stripping off suffixes other than '.pack' by making the suffix to strip a parameter of derive_filename(). In order to make this consistent with the "suffix" parameter which does not begin with a ".", an additional check in derive_filename. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Taylor Blau	8ef50d9958	pack-write.c: prepare to write 'pack-*.rev' files This patch prepares for callers to be able to write reverse index files to disk. It adds the necessary machinery to write a format-compliant .rev file from within 'write_rev_file()', which is called from 'finish_tmp_packfile()'. Similar to the process by which the reverse index is computed in memory, these new paths also have to sort a list of objects by their offsets within a packfile. These new paths use a qsort() (as opposed to a radix sort), since our specialized radix sort requires a full revindex_entry struct per object, which is more memory than we need to allocate. The qsort is obviously slower, but the theoretical slowdown would require a repository with a large amount of objects, likely implying that the time spent in, say, pack-objects during a repack would dominate the overall runtime. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Taylor Blau	2f4ba2a867	packfile: prepare for the existence of '.rev' files Specify the format of the on-disk reverse index 'pack-.rev' file, as well as prepare the code for the existence of such files. The reverse index maps from pack relative positions (i.e., an index into the array of object which is sorted by their offsets within the packfile) to their position within the 'pack-.idx' file. Today, this is done by building up a list of (off_t, uint32_t) tuples for each object (the off_t corresponding to that object's offset, and the uint32_t corresponding to its position in the index). To convert between pack and index position quickly, this array of tuples is radix sorted based on its offset. This has two major drawbacks: First, the in-memory cost scales linearly with the number of objects in a pack. Each 'struct revindex_entry' is sizeof(off_t) + sizeof(uint32_t) + padding bytes for a total of 16. To observe this, force Git to load the reverse index by, for e.g., running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking for a single object in a fresh clone of the kernel, Git needs to allocate 120+ MB of memory in order to hold the reverse index in memory. Second, the cost to sort also scales with the size of the pack. Luckily, this is a linear function since 'load_pack_revindex()' uses a radix sort, but this cost still must be paid once per pack per process. As an example, it takes ~60x longer to print the _size_ of an object as it does to print that entire object's _contents_: Benchmark #1: git.compile cat-file --batch <obj Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms] Range (min … max): 3.2 ms … 3.7 ms 726 runs Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms] Range (min … max): 193.7 ms … 224.4 ms 13 runs Instead, avoid computing and sorting the revindex once per process by writing it to a file when the pack itself is generated. The format is relatively straightforward. It contains an array of uint32_t's, the length of which is equal to the number of objects in the pack. The ith entry in this table contains the index position of the ith object in the pack, where "ith object in the pack" is determined by pack offset. One thing that the on-disk format does _not_ contain is the full (up to) eight-byte offset corresponding to each object. This is something that the in-memory revindex contains (it stores an off_t in 'struct revindex_entry' along with the same uint32_t that the on-disk format has). Omit it in the on-disk format, since knowing the index position for some object is sufficient to get a constant-time lookup in the pack-.idx file to ask for an object's offset within the pack. This trades off between the on-disk size of the 'pack-.rev' file for runtime to chase down the offset for some object. Even though the lookup is constant time, the constant is heavier, since it can potentially involve two pointer walks in v2 indexes (one to access the 4-byte offset table, and potentially a second to access the double wide offset table). Consider trying to map an object's pack offset to a relative position within that pack. In a cold-cache scenario, more page faults occur while switching between binary searching through the reverse index and searching through the .idx file for an object's offset. Sure enough, with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after 'sync'ing), printing out the entire object's contents is still marginally faster than printing its size: Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms] Range (min … max): 21.4 ms … 23.5 ms 41 runs Benchmark #2: git.compile cat-file --batch <obj >/dev/null Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms] Range (min … max): 15.6 ms … 18.2 ms 45 runs (Numbers taken in the kernel after cheating and using the next patch to generate a reverse index). There are a couple of approaches to improve cold cache performance not pursued here: - We could include the object offsets in the reverse index format. Predictably, this does result in fewer page faults, but it triples the size of the file, while simultaneously duplicating a ton of data already available in the .idx file. (This was the original way I implemented the format, and it did show `--batch-check='%(objectsize:disk)'` winning out against `--batch`.) On the other hand, this increase in size also results in a large block-cache footprint, which could potentially hurt other workloads. - We could store the mapping from pack to index position in more cache-friendly way, like constructing a binary search tree from the table and writing the values in breadth-first order. This would result in much better locality, but the price you pay is trading O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you can no longer directly index the table). So, neither of these approaches are taken here. (Thankfully, the format is versioned, so we are free to pursue these in the future.) But, cold cache performance likely isn't interesting outside of one-off cases like asking for the size of an object directly. In real-world usage, Git is often performing many operations in the revindex (i.e., asking about many objects rather than a single one). The trade-off is worth it, since we will avoid the vast majority of the cost of generating the revindex that the extra pointer chase will look like noise in the following patch's benchmarks. This patch describes the format and prepares callers (like in pack-revindex.c) to be able to read *.rev files once they exist. An implementation of the writer will appear in the next patch, and callers will gradually begin to start using the writer in the patches that follow after that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Junio C Hamano	e6362826a0	The fourth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 14:19:20 -08:00
Junio C Hamano	b7bb322cba	Merge branch 'ab/mailmap-fixup' Follow-up fixes and improvements to ab/mailmap topic. * ab/mailmap-fixup: t4203: make blame output massaging more robust mailmap doc: use correct environment variable 'GIT_WORK_TREE' t4203: stop losing return codes of git commands test-lib-functions.sh: fix usage for test_commit()	2021-01-25 14:19:20 -08:00
Junio C Hamano	bcaaf972e6	Merge branch 'tb/pack-revindex-api' Abstract accesses to in-core revindex that allows enumerating objects stored in a packfile in the order they appear in the pack, in preparation for introducing an on-disk precomputed revindex. * tb/pack-revindex-api: (21 commits) for_each_object_in_pack(): clarify pack vs index ordering pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' pack-revindex: hide the definition of 'revindex_entry' pack-revindex: remove unused 'find_revindex_position()' pack-revindex: remove unused 'find_pack_revindex()' builtin/gc.c: guess the size of the revindex for_each_object_in_pack(): convert to new revindex API unpack_entry(): convert to new revindex API packed_object_info(): convert to new revindex API retry_bad_packed_offset(): convert to new revindex API get_delta_base_oid(): convert to new revindex API rebuild_existing_bitmaps(): convert to new revindex API try_partial_reuse(): convert to new revindex API get_size_by_pos(): convert to new revindex API show_objects_for_type(): convert to new revindex API bitmap_position_packfile(): convert to new revindex API check_object(): convert to new revindex API write_reused_pack_verbatim(): convert to new revindex API write_reused_pack_one(): convert to new revindex API write_reuse_object(): convert to new revindex API ...	2021-01-25 14:19:20 -08:00
Junio C Hamano	381dac2349	Merge branch 'ab/coc-update-to-2.0' Update the Code-of-conduct to version 2.0 from the upstream (we've been using version 1.4). * ab/coc-update-to-2.0: CoC: update to version 2.0 + local changes CoC: explicitly take any whitespace breakage CoC: Update word-wrapping to match upstream	2021-01-25 14:19:19 -08:00
Junio C Hamano	294e949fa2	Merge branch 'ps/config-env-pairs' Introduce two new ways to feed configuration variable-value pairs via environment variables, and tweak the way GIT_CONFIG_PARAMETERS encodes variable/value pairs to make it more robust. * ps/config-env-pairs: config: allow specifying config entries via envvar pairs environment: make `getenv_safe()` a public function config: store "git -c" variables using more robust format config: parse more robust format in GIT_CONFIG_PARAMETERS config: extract function to parse config pairs quote: make sq_dequote_step() a public function config: add new way to pass config via `--config-env` git: add `--super-prefix` to usage string	2021-01-25 14:19:19 -08:00
Junio C Hamano	7eefa1349b	Merge branch 'cc/write-promisor-file' A bit of code refactoring. * cc/write-promisor-file: pack-write: die on error in write_promisor_file() fetch-pack: refactor writing promisor file fetch-pack: rename helper to create_promisor_file()	2021-01-25 14:19:19 -08:00
Junio C Hamano	8b48981987	Merge branch 'jx/bundle' "git bundle" learns "--stdin" option to read its refs from the standard input. Also, it now does not lose refs whey they point at the same object. * jx/bundle: bundle: arguments can be read from stdin bundle: lost objects when removing duplicate pendings test: add helper functions for git-bundle	2021-01-25 14:19:19 -08:00
Junio C Hamano	42342b3ee6	Merge branch 'ab/mailmap' Clean-up docs, codepaths and tests around mailmap. * ab/mailmap: (22 commits) shortlog: remove unused(?) "repo-abbrev" feature mailmap doc + tests: document and test for case-insensitivity mailmap tests: add tests for empty "<>" syntax mailmap tests: add tests for whitespace syntax mailmap tests: add a test for comment syntax mailmap doc + tests: add better examples & test them tests: refactor a few tests to use "test_commit --append" test-lib functions: add an --append option to test_commit test-lib functions: add --author support to test_commit test-lib functions: document arguments to test_commit test-lib functions: expand "test_commit" comment template mailmap: test for silent exiting on missing file/blob mailmap tests: get rid of overly complex blame fuzzing mailmap tests: add a test for "not a blob" error mailmap tests: remove redundant entry in test mailmap tests: improve --stdin tests mailmap tests: modernize syntax & test idioms mailmap tests: use our preferred whitespace syntax mailmap doc: start by mentioning the comment syntax check-mailmap doc: note config options ...	2021-01-25 14:19:19 -08:00
Junio C Hamano	60ecad090d	Merge branch 'ps/fetch-atomic' "git fetch" learns to treat ref updates atomically in all-or-none fashion, just like "git push" does, with the new "--atomic" option. * ps/fetch-atomic: fetch: implement support for atomic reference updates fetch: allow passing a transaction to `s_update_ref()` fetch: refactor `s_update_ref` to use common exit path fetch: use strbuf to format FETCH_HEAD updates fetch: extract writing to FETCH_HEAD	2021-01-25 14:19:19 -08:00
Junio C Hamano	b69bed22c5	Merge branch 'jk/log-cherry-pick-duplicate-patches' When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries	2021-01-25 14:19:19 -08:00
Junio C Hamano	27d7c8599b	Merge branch 'js/default-branch-name-tests-final-stretch' Prepare tests not to be affected by the name of the default branch "git init" creates. * js/default-branch-name-tests-final-stretch: (28 commits) tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed t99: adjust the references to the default branch name "main" tests(git-p4): transition to the default branch name `main` t9[5-7]: adjust the references to the default branch name "main" t9[0-4]: adjust the references to the default branch name "main" t8: adjust the references to the default branch name "main" t7[5-9]: adjust the references to the default branch name "main" t7[0-4]: adjust the references to the default branch name "main" t6[4-9]: adjust the references to the default branch name "main" t64: preemptively adjust alignment to prepare for `master` -> `main` t6[0-3]: adjust the references to the default branch name "main" t5[6-9]: adjust the references to the default branch name "main" t55[4-9]: adjust the references to the default branch name "main" t55[23]: adjust the references to the default branch name "main" t551: adjust the references to the default branch name "main" t550: adjust the references to the default branch name "main" t5503: prepare aligned comment for replacing `master` with `main` t5[0-4]: adjust the references to the default branch name "main" t5323: prepare centered comment for `master` -> `main` t4: adjust the references to the default branch name "main" ...	2021-01-25 14:19:18 -08:00
Junio C Hamano	440acfbe0c	Merge branch 'dl/reflog-with-single-entry' After expiring a reflog and making a single commit, the reflog for the branch would record a single entry that knows both @{0} and @{1}, but we failed to answer "what commit were we on?", i.e. @{1} * dl/reflog-with-single-entry: refs: allow @{n} to work with n-sized reflog refs: factor out set_read_ref_cutoffs()	2021-01-25 14:19:18 -08:00
Junio C Hamano	0806279428	Merge branch 'sj/untracked-files-in-submodule-directory-is-not-dirty' "git diff" showed a submodule working tree with untracked cruft as "Submodule commit <objectname>-dirty", but a natural expectation is that the "-dirty" indicator would align with "git describe --dirty", which does not consider having untracked files in the working tree as source of dirtiness. The inconsistency has been fixed. * sj/untracked-files-in-submodule-directory-is-not-dirty: diff: do not show submodule with untracked files as "-dirty"	2021-01-25 14:19:18 -08:00
Junio C Hamano	dfcd905069	Merge branch 'jc/deprecate-pack-redundant' Warn loudly when the "pack-redundant" command, which has been left stale with almost unusable performance issues, gets used, as we no longer want to recommend its use (instead just "repack -d" instead). * jc/deprecate-pack-redundant: pack-redundant: gauge the usage before proposing its removal	2021-01-25 14:19:18 -08:00
Junio C Hamano	c7b1aaf6d6	Merge branch 'jk/forbid-lf-in-git-url' Newline characters in the host and path part of git:// URL are now forbidden. * jk/forbid-lf-in-git-url: fsck: reject .gitmodules git:// urls with newlines git_connect_git(): forbid newlines in host and path	2021-01-25 14:19:17 -08:00
Junio C Hamano	9e409d7e07	Merge branch 'ab/branch-sort' The implementation of "git branch --sort" wrt the detached HEAD display has always been hacky, which has been cleaned up. * ab/branch-sort: branch: show "HEAD detached" first under reverse sort branch: sort detached HEAD based on a flag ref-filter: move ref_sorting flags to a bitfield ref-filter: move "cmp_fn" assignment into "else if" arm ref-filter: add braces to if/else if/else chain branch tests: add to --sort tests branch: change "--local" to "--list" in comment	2021-01-25 14:19:17 -08:00
Junio C Hamano	a5ac31b5b1	Merge branch 'en/diffcore-rename' File-level rename detection updates. * en/diffcore-rename: diffcore-rename: remove unnecessary duplicate entry checks diffcore-rename: accelerate rename_dst setup diffcore-rename: simplify and accelerate register_rename_src() t4058: explore duplicate tree entry handling in a bit more detail t4058: add more tests and documentation for duplicate tree entry handling diffcore-rename: reduce jumpiness in progress counters diffcore-rename: simplify limit check diffcore-rename: avoid usage of global in too_many_rename_candidates() diffcore-rename: rename num_create to num_destinations	2021-01-25 14:19:17 -08:00
Junio C Hamano	58e2ce9112	Merge branch 'ma/more-opaque-lock-file' Code clean-up. * ma/more-opaque-lock-file: read-cache: try not to peek into `struct {lock_,temp}file` refs/files-backend: don't peek into `struct lock_file` midx: don't peek into `struct lock_file` commit-graph: don't peek into `struct lock_file` builtin/gc: don't peek into `struct lock_file`	2021-01-25 14:19:17 -08:00
Junio C Hamano	2856089e36	Merge branch 'en/merge-ort-3' Rename detection is added to the "ORT" merge strategy. * en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames	2021-01-25 14:19:17 -08:00
Junio C Hamano	c7d6d419b0	Merge branch 'ab/mktag' "git mktag" validates its input using its own rules before writing a tag object---it has been updated to share the logic with "git fsck". * ab/mktag: (23 commits) mktag: add a --[no-]strict option mktag: mark strings for translation mktag: convert to parse-options mktag: allow omitting the header/body \n separator mktag: allow turning off fsck.extraHeaderEntry fsck: make fsck_config() re-usable mktag: use fsck instead of custom verify_tag() mktag: use puts(str) instead of printf("%s\n", str) mktag: remove redundant braces in one-line body "if" mktag: use default strbuf_read() hint mktag tests: test verify_object() with replaced objects mktag tests: improve verify_object() test coverage mktag tests: test "hash-object" compatibility mktag tests: stress test whitespace handling mktag tests: run "fsck" after creating "mytag" mktag tests: don't create "mytag" twice mktag tests: don't redirect stderr to a file needlessly mktag tests: remove needless SHA-1 hardcoding mktag tests: use "test_commit" helper mktag tests: don't needlessly use a subshell ...	2021-01-25 14:19:17 -08:00
Ævar Arnfjörð Bjarmason	95ca1f987e	grep/pcre2: better support invalid UTF-8 haystacks Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend. This is a more complete fix for a bug I started to fix in `870eea8166` (grep: do not enter PCRE2_UTF mode on fixed matching, 2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we can make use of it. This fixes the sort of case described in `8a5999838e` (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.: - The subject string is non-ASCII (e.g. "ævar") - We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C" - We are using --ignore-case, or we're a non-fixed pattern If those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms). Ultimately this fixes a "regression" in `b65abcafc7` ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc. cases either, it just broke in different ways. There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed in PCREv2 10.36. It can be worked around by setting the PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add tests for the bug. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-24 16:09:17 -08:00
Ævar Arnfjörð Bjarmason	a4fea08b6e	grep/pcre2 tests: don't rely on invalid UTF-8 data test As noted in [1] when I originally added this test in [2] the test was completely broken as it lacked a redirect[3]. I now think this whole thing is overly fragile. Let's only test if we have a segfault here. Before this the first test's "test_cmp" was pretty meaningless. We were only testing if PCREv2 was so broken that it would spew out something completely unrelated on stdout, which isn't very plausible. In the second test we're relying on PCREv2 forever holding to the current behavior of the PCRE_UTF8 flag, as opposed to learning some optimistic graceful fallback to PCRE2_MATCH_INVALID_UTF in the future. If that happens having this test broken under bisecting would suck. A follow-up commit will actually test this case in a meaningful way under the PCRE2_MATCH_INVALID_UTF flag. Let's run this one unconditionally, and just make sure we don't segfault. 1. `e714b898c6` (t7812: expect failure for grep -i with invalid UTF-8 data, 2019-11-29) 2. `8a5999838e` (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26) 3. `c74b3cbb83` (t7812: add missing redirects, 2019-11-26) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-24 16:09:15 -08:00
Elijah Newren	557ac0350d	merge-ort: begin performance work; instrument with trace2_region_* calls Add some timing instrumentation for both merge-ort and diffcore-rename; I used these to measure and optimize performance in both, and several future patch series will build on these to reduce the timings of some select testcases. === Setup === The primary testcase I used involved rebasing a random topic in the linux kernel (consisting of 35 patches) against an older version. I added two variants, one where I rename a toplevel directory, and another where I only rebase one patch instead of the whole topic. The setup is as follows: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34 $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e $ git switch -c 5.4-renames v5.4 $ git mv drivers pilots # Introduce over 26,000 renames $ git commit -m "Rename drivers/ to pilots/" $ git config merge.renameLimit 30000 $ git config merge.directoryRenames true === Testcases === Now with REBASE standing for either "git rebase [--merge]" (using merge-recursive) or "test-tool fast-rebase" (using merge-ort), the testcases are: Testcase #1: no-renames $ git checkout v5.4^0 $ REBASE --onto HEAD base hwmon-updates Note: technically the name is misleading; there are some renames, but very few. Rename detection only takes about half the overall time. Testcase #2: mega-renames $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-updates Testcase #3: just-one-mega $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-just-one === Timing results === Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames, 10 runs for the other two cases): merge-recursive merge-ort no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s A single re-run of each with some breakdowns: --- no-renames --- merge-recursive merge-ort overall runtime: 19.302 s 14.257 s inexact rename detection: 7.603 s 7.906 s everything else: 11.699 s 6.351 s --- mega-renames --- merge-recursive merge-ort overall runtime: 5950.195 s 5499.672 s inexact rename detection: 5746.309 s 5487.120 s everything else: 203.886 s 17.552 s --- just-one-mega --- merge-recursive merge-ort overall runtime: 151.001 s 158.582 s inexact rename detection: 143.448 s 157.835 s everything else: 7.553 s 0.747 s === Timing observations === 0) Maximum speedup The "everything else" row represents the maximum speedup we could achieve if we were to somehow infinitely parallelize inexact rename detection, but leave everything else alone. The fact that this is so much smaller than the real runtime (even in the case with virtually no renames) makes it clear just how overwhelmingly large the time spent on rename detection can be. 1) no-renames 1a) merge-ort is faster than merge-recursive, which is nice. However, this still should not be considered good enough. Although the "merge" backend to rebase (merge-recursive) is sometimes faster than the "apply" backend, this is one of those cases where it is not. In fact, even merge-ort is slower. The "apply" backend can complete this testcase in 6.940 s ± 0.485 s which is about 2x faster than merge-ort and 3x faster than merge-recursive. One goal of the merge-ort performance work will be to make it faster than git-am on this (and similar) testcases. 2) mega-renames 2a) Obviously rename detection is a huge cost; it's where most the time is spent. We need to cut that down. If we could somehow infinitely parallelize it and drive its time to 0, the merge-recursive time would drop to about 204s, and the merge-ort time would drop to about 17s. I think this particular stat shows I've subtly baked a couple performance improvements into merge-ort and into fast-rebase already. 3) just-one-mega 3a) not much to say here, it just gives some flavor for how rebasing only one patch compares to rebasing 35. === Goals === This patch is obviously just the beginning. Here are some of my goals that this measurement will help us achieve: * Drive the cost of rename detection down considerably for merges * After the above has been achieved, see if there are other slowness factors (which would have previously been overshadowed by rename detection costs) which we can then focus on and also optimize. * Ensure our rebase testcase that requires little rename detection is noticeably faster with merge-ort than with apply-based rebase. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <ttaylorr@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	5ced7c3da0	merge-ort: ignore the directory rename split conflict for now get_provisional_directory_renames() has code to detect directories being evenly split between different locations. However, as noted previously, if there are no new files added to that directory that was split evenly, our inability to determine where the directory was renamed to doesn't matter since there are no new files to try to move into the new location. Unfortunately, that code is unaware of whether there are new files under the directory in question and we just ignore that, causing us to fail t6423 test 2b but pass test 2a; turn off the error for now, swapping which tests pass and fail. The motivating reason for switching this off as a temporary measure is that as we add optimizations, we'll start looking at only subsets of renames, and subsets of renames can start switching the result we get when this error is (wrongly) on. Once we get enough optimizations, however, we can prevent that code from even running when there are no new files added to the relevant directory, at which point we can revert this commit and then both testcases 2a and 2b will pass simultaneously. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	cf8937acde	merge-ort: fix massive leak When a series of merges was performed (such as for a rebase or series of cherry-picks), only the data structures allocated by the final merge operation were being freed. The problem was that while picking out pieces of merge-ort to upstream, I previously misread a certain section of merge_start() and assumed it was associated with a later optimization. Include that section now, which ensures that if there was a previous merge operation, that we clear out result->priv and then re-use it for opt->priv, and otherwise we allocate opt->priv. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Ævar Arnfjörð Bjarmason	7599730b7e	Remove support for v1 of the PCRE library Remove support for using version 1 of the PCRE library. Its use has been discouraged by upstream for a long time, and it's in a bugfix-only state. Anyone who was relying on v1 in particular got a nudge to move to v2 in `e6c531b808` (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11), which was first released as part of v2.18.0. With this the LIBPCRE2 test prerequisites is redundant to PCRE. But I'm keeping it for self-documentation purposes, and to avoid conflict with other in-flight PCRE patches. I'm also not changing all of our own "pcre2" names to "pcre", i.e. the inverse of `6d4b5747f0` (grep: change internal pcre variable & function names to be pcre1, 2017-05-25). I don't see the point, and it makes the history/blame harder to read. Maybe if there's ever a PCRE v3... Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 21:15:43 -08:00
Ævar Arnfjörð Bjarmason	0205bb13d0	config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag Remove a flag added in my `fb95e2e38d` (grep: un-break building with PCRE >= 8.32 without --enable-jit, 2017-06-01). It's set just below USE_LIBPCRE=YesPlease, so it's been redundant since `e6c531b808` (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 21:15:12 -08:00
Derrick Stolee	19a0acc83e	t1092: test interesting sparse-checkout scenarios These also document some behaviors that differ from a full checkout, and possibly in a way that is not intended. The test is designed to be run with "--run=1,X" where 'X' is an interesting test case. Each test uses 'init_repos' to reset the full and sparse copies of the initial-repo that is created by the first test case. This also makes it possible to have test cases leave the working directory or index in unusual states without disturbing later cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:20 -08:00
Derrick Stolee	3b14436364	test-lib: test_region looks for trace2 regions From ff15d509b89edd4830d85d53cea3079a6b0c1c08 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@microsoft.com> Date: Mon, 11 Jan 2021 08:53:09 -0500 Subject: [PATCH 8/9] test-lib: test_region looks for trace2 regions Most test cases can verify Git's behavior using input/output expectations or changes to the .git directory. However, sometimes we want to check that Git did or did not run a certain section of code. This is particularly important for performance-only features that we want to ensure have been enabled in certain cases. Add a new 'test_region' function that checks if a trace2 region was entered and left in a given trace2 event log. There is one existing test (t0500-progress-display.sh) that performs this check already, so use the helper function instead. Note that this changes the expectations slightly. The old test (incorrectly) used two patterns for the 'grep' invocation, but this performs an OR of the patterns, not an AND. This means that as long as one region_enter event was logged, the test would succeed, even if it was not due to the progress category. More uses will be added in a later change. t6423-merge-rename-directories.sh also greps for region_enter lines, but it verifies the number of such lines, which is not the same as an existence check. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:18 -08:00
Derrick Stolee	dd23022acb	sparse-checkout: load sparse-checkout patterns A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	6a9372f4ef	name-hash: use trace2 regions for init The lazy_init_name_hash() populates a hashset with all filenames and another with all directories represented in the index. This is run only if we need to use the hashsets to check for existence or case-folding renames. Place trace2 regions where there is already a performance trace. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	1fd9ae517c	repository: add repo reference to index_state It will be helpful to add behavior to index operations that might trigger an object lookup. Since each index belongs to a specific repository, add a 'repo' pointer to struct index_state that allows access to this repository. Add a BUG() statement if the repo already has an index, and the index already has a repo, but somehow the index points to a different repo. This will prevent future changes from needing to pass an additional 'struct repository repo' parameter and instead rely only on the 'struct index_state istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	cae70acf24	fsmonitor: de-duplicate BUG()s around dirty bits The index has an fsmonitor_dirty bitmap that records which index entries are "dirty" based on the response from the FSMonitor. If this bitmap ever grows larger than the index, then there was an error in how it was constructed, and it was probably a developer's bug. There are several BUG() statements that are very similar, so replace these uses with a simpler assert_index_minimum(). Since there is one caller that uses a custom 'pos' value instead of the bit_size member, we cannot simplify it too much. However, the error string is identical in each, so this simplifies things. Be sure to add one when checking if a position if valid, since the minimum is a bound on the expected size. The end result is that the code is simpler to read while also preserving these assertions for developers in the FSMonitor space. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	c80dd3967f	cache-tree: extract subtree_pos() This method will be helpful to use outside of cache-tree.c in a later feature. The implementation is subtle due to subtree_name_cmp() sorting by length and then lexicographically. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	8d87e338e1	cache-tree: simplify verify_cache() prototype The verify_cache() method takes an array of cache entries and a count, but these are always provided directly from a struct index_state. Use a pointer to the full structure instead. There is a subtle point when istate->cache_nr is zero that subtracting one will underflow. This triggers a failure in t0000-basic.sh, among others. Use "i + 1 < istate->cache_nr" to avoid these strange comparisons. Convert i to be unsigned as well, which also removes the potential signed overflow in the unlikely case that cache_nr is over 2.1 billion entries. The 'funny' variable has a maximum value of 11, so making it unsigned does not change anything of importance. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	fb0882648e	cache-tree: clean up cache_tree_update() Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Ævar Arnfjörð Bjarmason	db89a82b5b	rm tests: actually test for SIGPIPE in SIGPIPE test Change a test initially added in `50cd31c652` (t3600: comment on inducing SIGPIPE in `git rm`, 2019-11-27) to explicitly test for SIGPIPE using a pattern initially established in `7559a1be8a` (unblock and unignore SIGPIPE, 2014-09-18). The problem with using that pattern is that it requires us to skip the test on MINGW[1]. If we kept the test with its initial semantics[2] we'd get coverage there, at the cost of not checking whether we actually had SIGPIPE outside of MinGW. Arguably we should just remove this test. Between the test added in `7559a1be8a` and the change made in `12e0437f23` (common-main: call restore_sigpipe_to_default(), 2016-07-01) it's a bit arbitrary to only check this for "git rm". But in lieu of having wider test coverage for other "git" subcommands let's refactor this to explicitly test for SIGPIPE outside of MinGW, and then just that we remove the ".git/index.lock" (as before) on all platforms. 1. https://lore.kernel.org/git/xmqq1rec5ckf.fsf@gitster.c.googlers.com/ 2. `0693f9ddad` (Make sure lockfiles are unlocked when dying on SIGPIPE, 2008-12-18) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	60127996b5	archive tests: use a cheaper "zipinfo -h" invocation to get header Change an invocation of zipinfo added in `19ee29401d` (t5004: test ZIP archives with many entries, 2015-08-22) to simply ask zipinfo for the header info, rather than spewing out info about the entire archive and race to kill it with SIGPIPE due to the downstream "head -2". I ran across this because I'm adding a "set -o pipefail" test mode. This won't be needed for the version of the mode that I'm introducing (which currently relies on a patch to GNU bash), but I think this is a good idea anyway. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	9aebc4708a	upload-pack tests: avoid a non-zero "grep" exit status Continue changing a test that `763b47bafa` (t5703: stop losing return codes of git commands, 2019-11-27) already refactored. This was originally added as part of a series to add support for running under bash's "set -o pipefail", under that mode this test will fail because sometimes there's no commits in the "objs" output. It's easier to fix that than exempt these tests under a hypothetical "set -o pipefail" test mode. It looks like we probably won't have that, but once we've dug this code up let's refactor it[2] so we don't hide a potential pipe failure. 1. https://lore.kernel.org/git/xmqqzh18o8o6.fsf@gitster.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Jeff King	796c248dc1	git-svn tests: rewrite brittle tests to use "--[no-]merges". Rewrite a brittle tests which used "rev-list" without "--[no-]merges" to figure out if a set of commits turned into merge commits or not. Signed-off-by: Jeff King <peff@peff.net> [ÆAB: wrote commit message] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	f918a89e50	git svn mergeinfo tests: refactor "test -z" to use test_must_be_empty Refactor some old-style test code to use test_must_be_empty instead of "test -z". This makes a follow-up commit easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	4669917e8f	git svn mergeinfo tests: modernize redirection & quoting style Use "<file" instead of "< file", and don't put the closing quote for strings on an indented line. This makes a follow-up refactoring commit easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	ef83970059	cache-tree tests: explicitly test HEAD and index differences The test code added in `9c4d6c0297` (cache-tree: Write updated cache-tree after commit, 2014-07-13) used "ls-files" in lieu of "ls-tree" because it wanted to test the data in the index, since this test is testing the cache-tree extension. Change the test to instead use "ls-tree" for traversal, and then explicitly check how HEAD differs from the index. This is more easily understood, and less fragile as numerous past bug fixes[1][2][3] to the old code we're replacing demonstrate. As an aside this would be a bit easier if empty pathspecs hadn't been made an error in `d426430e6e` (pathspec: warn on empty strings as pathspec, 2016-06-22) and `9e4e8a64c2` (pathspec: die on empty strings as pathspec, 2017-06-06). If that was still allowed this code could be simplified slightly: diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh index 9bf66c9e68..0b02881f55 100755 --- a/t/t0090-cache-tree.sh +++ b/t/t0090-cache-tree.sh @@ -18,19 +18,18 @@ cmp_cache_tree () { # test-tool dump-cache-tree already verifies that all existing data is # correct. generate_expected_cache_tree () { - pathspec="$1" && - dir="$2${2:+/}" && + pathspec="$1${1:+/}" && git ls-tree --name-only HEAD -- "$pathspec" >files && git ls-tree --name-only -d HEAD -- "$pathspec" >subtrees && - printf "SHA %s (%d entries, %d subtrees)\n" "$dir" $(wc -l <files) $(wc -l <subtrees) && + printf "SHA %s (%d entries, %d subtrees)\n" "$pathspec" $(wc -l <files) $(wc -l <subtrees) && while read subtree do - generate_expected_cache_tree "$pathspec/$subtree/" "$subtree" \|\| return 1 + generate_expected_cache_tree "$subtree" \|\| return 1 done <subtrees } test_cache_tree () { - generate_expected_cache_tree "." >expect && + generate_expected_cache_tree >expect && cmp_cache_tree expect && rm expect actual files subtrees && git status --porcelain -- ':!status' ':!expected.status' >status && 1. `c8db708d5d` (t0090: avoid passing empty string to printf %d, 2014-09-30) 2. `d69360c6b1` (t0090: tweak awk statement for Solaris /usr/xpg4/bin/awk, 2014-12-22) 3. `9b5a9fa60a` (t0090: stop losing return codes of git commands, 2019-11-27) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	fa6edee776	cache-tree tests: use a sub-shell with less indirection Change a "cd xyz && work && cd .." pattern introduced in `9c4d6c0297` (cache-tree: Write updated cache-tree after commit, 2014-07-13) to use a sub-shell instead with less indirection. We did actually recover correctly if we failed in this function since we were wrapped in a subshell one function call up. Let's just use the sub-shell at the point where we want to change the directory instead. It's important that the "\|\| return 1" is outside the subshell. Normally, we `exit 1` from within subshells[1], but that wouldn't help us exit this loop early[1][2]. Since we can get rid of the wrapper function let's rename the main function to drop the "rec" (for "recursion") suffix[3]. 1. https://lore.kernel.org/git/CAPig+cToj8nQmyBCqC1k7DXF2vXaonCEA-fCJ4x7JBZG2ixYBw@mail.gmail.com/ 2. https://lore.kernel.org/git/20150325052952.GE31924@peff.net/ 3. https://lore.kernel.org/git/YARsCsgXuiXr4uFX@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	3226725507	cache-tree tests: remove unused $2 parameter Remove the $2 paramater. This appears to have been some work-in-progress code from an earlier version of `9c4d6c0297` (cache-tree: Write updated cache-tree after commit, 2014-07-13) which was left in the final version. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:12 -08:00
Ævar Arnfjörð Bjarmason	3f96d75ef5	cache-tree tests: refactor for modern test style Refactor the cache-tree test file to use our current recommended patterns. This makes a subsequent meaningful change easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 13:25:11 -08:00
ZheNing Hu	93a7d9835f	ls-files.c: add --deduplicate option During a merge conflict, the name of a file may appear multiple times in "git ls-files" output, once for each stage. If you use both `--delete` and `--modify` at the same time, the output may mention a deleted file twice. When none of the '-t', '-u', or '-s' options is in use, these duplicate entries do not add much value to the output. Introduce a new '--deduplicate' option to suppress them. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: extended doc and rewritten commit log] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:20 -08:00
ZheNing Hu	ed644d1666	ls_files.c: consolidate two for loops into one This will make it easier to show only one entry per filename in the next step. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: corrected the log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:20 -08:00
ZheNing Hu	f1c462ea41	ls_files.c: bugfix for --deleted and --modified This situation may occur in the original code: lstat() failed but we use `&st` to feed ie_modified() later. Therefore, we can directly execute show_ce without the judgment of ie_modified() when lstat() has failed. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: fixed misindented code] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:11 -08:00
Taylor Blau	b3970c702c	ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs performs a single revision walk over the whole ref namespace, and sends ones that match with one of the given ref prefixes down to the user. This can be expensive if there are many refs overall, but the portion of them covered by the given prefixes is small by comparison. To attempt to reduce the difference between the number of refs traversed, and the number of refs sent, only traverse references which are in the longest common prefix of the given prefixes. This is very reminiscent of the approach taken in `b31e2680c4` (ref-filter.c: find disjoint pattern prefixes, 2019-06-26) which does an analogous thing for multi-patterned 'git for-each-ref' invocations. The callback 'send_ref' is resilient to ignore extra patterns by discarding any arguments which do not begin with at least one of the specified prefixes. Similarly, the code introduced in `b31e2680c4` is resilient to stop early at metacharacters, but we only pass strict prefixes here. At worst we would return too many results, but the double checking done by send_ref will throw away anything that doesn't start with something in the prefix list. Finally, if no prefixes were provided, then implicitly add the empty string (which will match all references) since this matches the existing behavior (see the "no restrictions" comment in "ls-refs.c:ref_match()"). Original-patch-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jacob Vosmaer	83befd3724	ls-refs.c: initialize 'prefixes' before using it Correctly initialize the "prefixes" strvec using strvec_init() instead of simply zeroing it via the earlier memset(). There's no way to trigger a crash, since the first 'ref-prefix' command will initialize the strvec via the 'ALLOC_GROW' in 'strvec_push_nodup()' (the alloc and nr variables are already zero'd, so the call to ALLOC_GROW is valid). If no "ref-prefix" command was given, then the call to 'ls-refs.c:ref_match()' will abort early after it reads the zero in 'prefixes->nr'. Likewise, strvec_clear() will only call free() on the array, which is NULL, so we're safe there, too. But, all of this is dangerous and requires more reasoning than it would if we simply called 'strvec_init()', so do that. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Taylor Blau	16b1985be5	refs: expose 'for_each_fullref_in_prefixes' This function was used in the ref-filter.c code to find the longest common prefix of among a set of refspecs, and then to iterate all of the references that descend from that prefix. A future patch will want to use that same code from ls-refs.c, so prepare by exposing and moving it to refs.c. Since there is nothing specific to the ref-filter code here (other than that it was previously the only caller of this function), this really belongs in the more generic refs.h header. The code moved in this patch is identical before and after, with the one exception of renaming some arguments to be consistent with other functions exposed in refs.h. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jacob Vosmaer	be18153b97	builtin/pack-objects.c: avoid iterating all refs In git-pack-objects, we iterate over all the tags if the --include-tag option is passed on the command line. For some reason this uses for_each_ref which is expensive if the repo has many refs. We should use for_each_tag_ref instead. Because the add_ref_tag callback will now only visit tags we simplified it a bit. The motivation for this change is that we observed performance issues with a repository on gitlab.com that has 500,000 refs but only 2,000 tags. The fetch traffic on that repo is dominated by CI, and when we changed CI to fetch with 'git fetch --no-tags' we saw a dramatic change in the CPU profile of git-pack-objects. This lead us to this particular ref walk. More details in: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 17:27:42 -08:00
Jeff King	ee4e22554f	run-command: document use_shell option It's unclear how run-command's use_shell option should impact the arguments fed to a command. Plausibly it could mean that we glue all of the arguments together into a string to pass to the shell, in which case that opens the question of whether the caller needs to quote them. But in fact we don't implement it that way (and even if we did, we'd probably auto-quote the arguments as part of the glue step). And we must not receive quoted arguments, because we might actually optimize out the shell entirely (i.e., the caller does not even know if a shell will be involved in the end or not). Since this ambiguity may have been the cause of a recent bug, let's document the option a bit. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 14:21:32 -08:00
Jiang Xin	822ee894f6	t5411: refactor check of refs using test_cmp_refs Add new helper 'test_cmp_refs' to check references in a repository. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 13:09:06 -08:00
Jiang Xin	8388a64cd1	t5411: use different out file to prevent overwriting SZEDER reported that t5411 failed in Travis CI's s390x environment a couple of times, and could be reproduced with '--stress' test on this specific environment. The test failure messages might look like this: + test_cmp expect actual --- expect 2021-01-17 21:55:23.430750004 +0000 +++ actual 2021-01-17 21:55:23.430750004 +0000 @@ -1 +1 @@ -<COMMIT-A> refs/heads/main +<COMMIT-A> refs/heads/maifatal: the remote end hung up unexpectedly error: last command exited with $?=1 not ok 86 - proc-receive: not support push options (builtin protocol) The file 'actual' is filtered from the file 'out' which contains result of 'git show-ref' command. Due to the error messages from other process is written into the file 'out' accidentally, t5411 failed. SZEDER finds the root cause of this issue: - 'git push' is executed with its standard output and error redirected to the file 'out'. - 'git push' executes 'git receive-pack' internally, which inherits the open file descriptors, so its output and error goes into that same 'out' file. - 'git push' ends without waiting for the close of 'git-receive-pack' for some cases, and the file 'out' is reused for test of 'git show-ref' afterwards. - A mixture of the output of 'git show-ref' abd 'git receive-pack' leads to this issue. The first intuitive reaction to resolve this issue is to remove the file 'out' after use, so that the newly created file 'out' will have a different file descriptor and will not be overwritten by the 'git receive-pack' process. But Johannes pointed out that removing an open file is not possible on Windows. So we use different temporary file names to store the output of 'git push' to solve this issue. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Johannes Sixt <j6t@kdbg.org> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 13:09:04 -08:00
Phil Hord	8198907795	use delete_refs when deleting tags or branches 'git tag -d' accepts one or more tag refs to delete, but each deletion is done by calling `delete_ref` on each argv. This is very slow when removing from packed refs. Use delete_refs instead so all the removals can be done inside a single transaction with a single update. Do the same for 'git branch -d'. Since delete_refs performs all the packed-refs delete operations inside a single transaction, if any of the deletes fail then all them will be skipped. In practice, none of them should fail since we verify the hash of each one before calling delete_refs, but some network error or odd permissions problem could have different results after this change. Also, since the file-backed deletions are not performed in the same transaction, those could succeed even when the packed-refs transaction fails. After deleting branches, remove the branch config only if the branch ref was removed and was not subsequently added back in. A manual test deleting 24,000 tags took about 30 minutes using delete_ref. It takes about 5 seconds using delete_refs. Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 16:05:05 -08:00
Jeff King	36a317929b	refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char refname, const struct object_id oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:51:31 -08:00
Ævar Arnfjörð Bjarmason	73c01d25fe	tests: remove uses of GIT_TEST_GETTEXT_POISON=false As noted in previous commits we are removing the use of GIT_TEST_GETTEXT_POISON=false. These tests all relied on the facility being off, it always is off after an earlier change, but we hadn't removed the redundant assignments to "false" in the tests. I'm preserving the deletion of "error" lines in `38b9197a76` (t5411: add basic test cases for proc-receive hook, 2020-08-27), it turns out that's useful even without GIT_TEST_GETTEXT_POISON=true in play. Update a comment added in that commit to note that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:03 -08:00
Ævar Arnfjörð Bjarmason	d162b25f95	tests: remove support for GIT_TEST_GETTEXT_POISON This removes the ability to inject "poison" gettext() messages via the GIT_TEST_GETTEXT_POISON special test setup. I initially added this as a compile-time option in `bb946bba76` (i18n: add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and most recently modified to be toggleable at runtime in `6cdccfce1e` (i18n: make GETTEXT_POISON a runtime option, 2018-11-08).. The reason for its removal is that the trade-off of maintaining it v.s. what it's getting us has long since flipped. When gettext was integrated in `5e9637c629` (i18n: add infrastructure for translating Git with gettext, 2011-11-18) there was understandable concern on the Git ML that in marking messages for translation en-masse we'd inadvertently mark plumbing messages. The GETTEXT_POISON facility was a way to smoke those out via our test suite. Nowadays however we're done (or almost entirely done) with any marking of messages for translation. New messages are usually marked by their authors, who'll know whether it makes sense to translate them or not. If not any errors in marking the messages are much more likely to be spotted in review than in the the initial deluge of i18n patches in the 2011-2012 era. So let's just remove this. This leaves the test suite in a state where we still have a lot of test_i18n, C_LOCALE_OUTPUT etc. uses. Subsequent commits will remove those too. The change to t/lib-rebase.sh is a selective revert of the relevant part of `f2d17068fd` (i18n: rebase-interactive: mark comments of squash for translation, 2016-06-17), and the comment in t/t3406-rebase-message.sh is from `c7108bf9ed` (i18n: rebase: mark messages for translation, 2012-07-25). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:01 -08:00
Ævar Arnfjörð Bjarmason	6c280b4142	ci: remove GETTEXT_POISON jobs A subsequent commit will remove GETTEXT_POISON entirely, let's start by removing the CI jobs that enable the option. We cannot just remove the job because the CI is implicitly depending on the "poison" job being a sort of "default" job in the sense that it's the job that was otherwise run with the default compiler, no other GIT_TEST_* options etc. So let's keep it under the name "linux-gcc-default". This means we can remove the initial "make test" from the "linux-gcc" job (it does another one after setting a bunch of GIT_TEST_* variables). I'm not doing that because it would conflict with the in-flight `334afbc76f` (tests: mark tests relying on the current default for `init.defaultBranch`, 2020-11-18) (currently on the "seen" branch, so the SHA-1 will almost definitely change). It's going to use that "make test" again for different reasons, so let's preserve it for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:00 -08:00
Junio C Hamano	fe2f4d0031	Merge branch 'en/ort-directory-rename' into en/merge-ort-perf * en/ort-directory-rename: (28 commits) merge-ort: fix a directory rename detection bug merge-ort: process_renames() now needs more defensiveness merge-ort: implement apply_directory_rename_modifications() merge-ort: add a new toplevel_dir field merge-ort: implement handle_path_level_conflicts() merge-ort: implement check_for_directory_rename() merge-ort: implement apply_dir_rename() and check_dir_renamed() merge-ort: implement compute_collisions() merge-ort: modify collect_renames() for directory rename handling merge-ort: implement handle_directory_level_conflicts() merge-ort: implement compute_rename_counts() merge-ort: copy get_renamed_dir_portion() from merge-recursive.c merge-ort: add outline of get_provisional_directory_renames() merge-ort: add outline for computing directory renames merge-ort: collect which directories are removed in dirs_removed merge-ort: initialize and free new directory rename data structures merge-ort: add new data structures for directory rename detection merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions ...	2021-01-20 22:52:50 -08:00
Elijah Newren	203c872c4f	merge-ort: fix a directory rename detection bug As noted in commit `902c521a35` ("t6423: more involved directory rename test", 2020-10-15), when we have a case where * dir/subdir/ has several files * almost all files in dir/subdir/ are renamed to folder/subdir/ * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/ * the other side of history (that doesn't do the renames) adds a new file to dir/subdir/ Then for the majority of the file renames, the directory rename of dir/subdir/ -> folder/subdir/ is actually not represented that way but as dir/ -> folder/ We also had one rename that was represented as dir/subdir/ -> folder/subdir/newsubdir/ Now, since there's a new file in dir/subdir/, where does it go? Well, there's only one rule for dir/subdir/, so the code previously noted that this rule had the "majority" of the one "relevant" rename and thus erroneously used it to place the file in folder/subdir/newsubdir/. We really want the heavy weight associated with dir/ -> folder/ to also be treated as dir/subdir/ -> folder/subdir/, so that we correctly place the file in folder/subdir/. Add a bunch of logic to make sure that we use all relevant renamings in directory rename detection. Note that testcase 12f of t6423 still fails after this, but it gets further than merge-recursive does. There are some performance related bits in that testcase (the region_enter messages) that do not yet succeed, but the rest of the testcase works after this patch. Subsequent patch series will fix up the performance side. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	1b6b902d95	merge-ort: process_renames() now needs more defensiveness Since directory rename detection adds new paths to opt->priv->paths and removes old ones, process_renames() needs to now check whether pair->one->path actually exists in opt->priv->paths instead of just assuming it does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	089d82bc18	merge-ort: implement apply_directory_rename_modifications() This function roughly follows the same outline as the function of the same name from merge-recursive.c, but the code diverges in multiple ways due to some special considerations: * merge-ort's version needs to update opt->priv->paths with any new paths (and opt->priv->paths points to struct conflict_infos which track quite a bit of metadata for each path); merge-recursive's version would directly update the index * merge-ort requires that opt->priv->paths has any leading directories of any relevant files also be included in the set of paths. And due to pointer equality requirements on merged_info.directory_name, we have to be careful how we compute and insert these. * due to the above requirements on opt->priv->paths, merge-ort's version starts with a long comment to explain all the special considerations that need to be handled * merge-ort can use the full data stored in opt->priv->paths to avoid making expensive get_tree_entry() calls to regather the necessary data. * due to messages being deferred automatically in merge-ort, this is the best place to handle conflict messages whereas in merge-recursive.c they are deferred manually so that processing of entries does all the printing Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	05b85c6eeb	merge-ort: add a new toplevel_dir field Due to the string-equality-iff-pointer-equality requirements placed on merged_info.directory_name, apply_directory_rename_modifications() will need to have access to the exact toplevel directory name string pointer and can't just use a new empty string. Store it in a field that we can use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	bea433655a	merge-ort: implement handle_path_level_conflicts() This is copied from merge-recursive.c, with minor tweaks due to: * using strmap API * merge-ort not using the non_unique_new_dir field, since it'll obviate its need entirely later with performance improvements * adding a new path_in_way() function that uses opt->priv->paths instead of doing an expensive tree_has_path() lookup to see if a tree has a given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	47325e8533	merge-ort: implement check_for_directory_rename() This is copied from merge-recursive.c, with minor tweaks due to using strmap API and the fact that it can use opt->priv->paths to get all pathnames that exist instead of taking a tree object. This depends on a new function, handle_path_level_conflicts(), which just has a placeholder die-not-yet-implemented implementation for now; a subsequent patch will implement it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	fbcfc0cc17	merge-ort: implement apply_dir_rename() and check_dir_renamed() Both of these are copied from merge-recursive.c, with just minor tweaks due to using strmap API and not having a non_unique_new_dir field. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	d9d015df4a	merge-ort: implement compute_collisions() This is nearly a wholesale copy of compute_collisions() from merge-recursive.c, and the logic remains the same, but it has been tweaked slightly due to: * using strmap.h API (instead of direct hashmaps) * allocation/freeing of data structures were done separately in merge_start() and clear_or_reinit_internal_opts() in an earlier patch in this series * there is no non_unique_new_dir data field in merge-ort; that will be handled a different way It does depend on two new functions, apply_dir_rename() and check_dir_renamed() which were introduced with simple die-not-yet-implemented shells and will be implemented in subsequent patches. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	fa5e06d690	merge-ort: modify collect_renames() for directory rename handling collect_renames() is similar to merge-recursive.c's get_renames(), but lacks the directory rename handling found in the latter. Port that code structure over to merge-ort. This introduces three new die-not-yet-implemented functions that will be defined in future commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	98d0d08128	merge-ort: implement handle_directory_level_conflicts() This is modelled on the version of handle_directory_level_conflicts() from merge-recursive.c, but is massively simplified due to the following factors: * strmap API provides simplifications over using direct hashmap * we have a dirs_removed field in struct rename_info that we have an easy way to populate from collect_merge_info(); this was already used in compute_rename_counts() and thus we do not need to check for condition #2. * The removal of condition #2 by handling it earlier in the code also obviates the need to check for condition #3 -- if both sides renamed a directory, meaning that the directory no longer exists on either side, then neither side could have added any new files to that directory, and thus there are no files whose locations we need to move due to such a directory rename. In fact, the same logic that makes condition #3 irrelevant means condition #1 is also irrelevant so we could drop this function. However, it is cheap to check if both sides rename the same directory, and doing so can save future computation. So, simply remove any directories that both sides renamed from the list of directory renames. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	2f620a4f19	merge-ort: implement compute_rename_counts() This function is based on the first half of get_directory_renames() from merge-recursive.c; as part of the implementation, factor out a routine, increment_count(), to update the bookkeeping to track the number of items renamed into new directories. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	9fe37e7bb9	merge-ort: copy get_renamed_dir_portion() from merge-recursive.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	04264d4079	merge-ort: add outline of get_provisional_directory_renames() This function is based on merge-recursive.c's get_directory_renames(), except that the first half has been split out into a not-yet-implemented compute_rename_counts(). The primary difference here is our lack of the non_unique_new_dir boolean in our strmap. The lack of that field will at first cause us to fail testcase 2b of t6423; however, future optimizations will obviate the need for that ugly field so we have just left it out. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	112e11126b	merge-ort: add outline for computing directory renames Port some directory rename handling changes from merge-recursive.c's detect_and_process_renames() to the same-named function of merge-ort.c. This does not yet add any use or handling of directory renames, just the outline for where we start to compute them. Thus, a future patch will add port additional changes to merge-ort's detect_and_process_renames(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Derrick Stolee	3cf5f221be	t7900: clean up some broken refs The tests for the 'prefetch' task create remotes and fetch refs into 'refs/prefetch/<remote>/' and tags into 'refs/tags/'. These tests use the remotes to create objects not intended to be seen by the "local" repository. In that sense, the incrmental-repack tasks did not have these objects and refs in mind. That test replaces the object directory with a specific pack-file layout for testing the batch-size logic. However, this causes some operations to start showing warnings such as: error: refs/prefetch/remote1/one does not point to a valid object! error: refs/tags/one does not point to a valid object! This only shows up if you run the tests verbosely and watch the output. It caught my eye and I _thought_ that there was a bug where 'git gc' or 'git repack' wouldn't check 'refs/prefetch/' before pruning objects. That is incorrect. Those commands do handle 'refs/prefetch/' correctly. All that is left is to clean up the tests in t7900-maintenance.sh to remove these tags and refs that are not being repacked for the incremental-repack tests. Use update-ref to ensure this works with all ref backends. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 18:46:22 -08:00
Derrick Stolee	96eaffebbf	maintenance: set log.excludeDecoration durin prefetch The 'prefetch' task fetches refs from all remotes and places them in the refs/prefetch/<remote>/ refspace. As this task is intended to run in the background, this allows users to keep their local data very close to the remote servers' data while not updating the users' understanding of the remote refs in refs/remotes/<remote>/. However, this can clutter 'git log' decorations with copies of the refs with the full name 'refs/prefetch/<remote>/<branch>'. The log.excludeDecoration config option was added in `a6be5e67` (log: add log.excludeDecoration config option, 2020-05-16) for exactly this purpose. Ensure we set this only for users that would benefit from it by assigning it at the beginning of the prefetch task. Other alternatives would be during 'git maintenance register' or 'git maintenance start', but those might assign the config even when the prefetch task is disabled by existing config. Further, users could run 'git maintenance run --task=prefetch' using their own scripting or scheduling. This provides the best coverage to automatically update the config when valuable. It is improbable, but possible, that users might want to run the prefetch task _and_ see these refs in their log decorations. This seems incredibly unlikely to me, but users can always opt-in on a command-by-command basis using --decorate-refs=refs/prefetch/. Test that this works in a few cases. In particular, ensure that our assignment of log.excludeDecoration=refs/prefetch/ is additive to other existing exclusions. Further, ensure we do not add multiple copies in multiple runs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 18:46:22 -08:00
brian m. carlson	1fb5cf0da6	commit: ignore additional signatures when parsing signed commits When we create a commit with multiple signatures, neither of these signatures includes the other. Consequently, when we produce the payload which has been signed so we can verify the commit, we must strip off any other signatures, or the payload will differ from what was signed. Do so, and in preparation for verifying with multiple algorithms, pass the algorithm we want to verify into parse_signed_commit. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 17:38:20 -08:00
brian m. carlson	83dff3eb2e	ref-filter: switch some uses of unsigned long to size_t In the future, we'll want to pass some of the arguments of find_subpos to strbuf_detach, which takes a size_t. This is fine on systems where that's the same size as unsigned long, but that isn't the case on all systems. Moreover, size_t makes sense since it's not possible to use a buffer here that's larger than memory anyway. Let's switch each use to size_t for these lengths in grab_sub_body_contents and find_subpos. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 17:38:19 -08:00
Abhishek Kumar	5a3b130cad	doc: add corrected commit date info With generation data chunk and corrected commit dates implemented, let's update the technical documentation for commit-graph. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	8d00d7c3df	commit-reach: use corrected commit dates in paint_down_to_common() `091f4cf` (commit: don't use generation numbers if not needed, 2018-08-30) changed paint_down_to_common() to use commit dates instead of generation numbers v1 (topological levels) as the performance regressed on certain topologies. With generation number v2 (corrected commit dates) implemented, we no longer have to rely on commit dates and can use generation numbers. For example, the command `git merge-base v4.8 v4.9` on the Linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively. While using corrected commit dates, Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date). This change incidentally broke the fragile t6404-recursive-merge test. t6404-recursive-merge sets up a unique repository where all commits have the same committer date without a well-defined merge-base. While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer date as a heuristic in paint_down_to_common(). 6404.1 'combined merge conflicts' merges commits in the order: - Merge C with B to form an intermediate commit. - Merge the intermediate commit with A. With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently use the corrected committer date, which changes the order in which commits are merged: - Merge A with B to form an intermediate commit. - Merge the intermediate commit with C. While resulting repositories are equivalent, 6404.4 'virtual trees were processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting different merge-bases and thus have different object ids for the intermediate commits. As this has already causes problems (as noted in `859fdc0` (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within t6404-recursive-merge. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	1fdc383c5c	commit-graph: use generation v2 only if entire chain does Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible: 1. "New" Git writes a commit-graph with the GDAT chunk. 2. "Old" Git writes a split commit-graph on top without a GDAT chunk. If each layer of split commit-graph is treated independently, as it was the case before this commit, with Git inspecting only the current layer for chunk_generation_data pointer, commits in the lower layer (one with GDAT) whould have corrected commit date as their generation number, while commits in the upper layer would have topological levels as their generation. Corrected commit dates usually have much larger values than topological levels. This means that if we take two commits, one from the upper layer, and one reachable from it in the lower layer, then the expectation that the generation of a parent is smaller than the generation of a child would be violated. It is difficult to expose this issue in a test. Since we _start_ with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number. In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer. This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph" since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer). Therefore, When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	e8b63005c4	commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	c1a09119f6	commit-graph: implement corrected commit date With most of preparations done, let's implement corrected commit date. The corrected commit date for a commit is defined as: * A commit with no parents (a root commit) has corrected commit date equal to its committer date. * A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents. As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date). To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file. The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date. Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t). However, corrected commit date offsets can be safely stored using only 32-bits. This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file. However, using offsets be problematic if a commit is malformed but valid and has committer date of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly. While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data. It will begin writing commit date offsets with the introduction of generation data chunk. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	d7f92784c6	commit-graph: return 64-bit generation number In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	72a2bfcaf0	commit-graph: add a slab to store topological levels In a later commit we will introduce corrected commit date as the generation number v2. Corrected commit dates will be stored in the new seperate Generation Data chunk. However, to ensure backwards compatibility with "Old" Git we need to continue to write generation number v1 (topological levels) to the commit data chunk. Thus, we need to compute and store both versions of generation numbers to write the commit-graph file. Therefore, let's introduce a commit-slab `topo_level_slab` to store topological levels; corrected commit date will be stored in the member `generation` of struct commit_graph_data. The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO` mark commits not in the commit-graph file and commits written by a version of Git that did not compute generation numbers respectively. Generation numbers are computed identically for both kinds of commits. A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit is not in the commit-graph file. However, since the slab is zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`). Thus, we no longer need to check if the topological level of a commit is `GENERATION_NUMBER_INFINITY`. We will add a pointer to the slab in `struct write_commit_graph_context` and `struct commit_graph` to populate the slab in `fill_commit_graph_info` if the commit has a pre-computed topological level as in case of split commit-graphs. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	c0ef139843	t6600-test-reach: generalize _three_modes In a preparatory step to implement generation number v2, we add tests to ensure Git can read and parse commit-graph files without Generation Data chunk. These files represent commit-graph files written by Old Git and are neccesary for backward compatability. We extend run_three_modes() and test_three_modes() to _all_modes() with the fourth mode being "commit-graph without generation data chunk". Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	f90fca638e	commit-graph: consolidate fill_commit_graph_info Both fill_commit_graph_info() and fill_commit_in_graph() parse information present in commit data chunk. Let's simplify the implementation by calling fill_commit_graph_info() within fill_commit_in_graph(). fill_commit_graph_info() used to not load committer data from commit data chunk. However, with the upcoming switch to using corrected committer date as generation number v2, we will have to load committer date to compute generation number value anyway. `e51217e15` (t5000: test tar files that overflow ustar headers, 30-06-2016) introduced a test 'generate tar with future mtime' that creates a commit with committer date of (2^36 + 1) seconds since EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus committer time overflows into generation number (within CDAT chunk) and has undefined behavior. The test used to pass as fill_commit_graph_info() would not set struct member `date` of struct commit and load committer date from the object database, generating a tar file with the expected mtime. However, with corrected commit date, we will load the committer date from CDAT chunk (truncated to lower 34-bits to populate the generation number. Thus, Git sets date and generates tar file with the truncated mtime. The ustar format (the header format used by most modern tar programs) only has room for 11 (or 12, depending on some implementations) octal digits for the size and mtime of each file. As the CDAT chunk is overflow by 12-octal digits but not 11-octal digits, we split the existing tests to test both implementations separately and add a new explicit test for 11-digit implementation. To test the 11-octal digit implementation, we create a future commit with committer date of 2^34 - 1, which overflows 11-octal digits without overflowing 34-bits of the Commit Date chunks. To test the 12-octal digit implementation, the smallest committer date possible is 2^36 + 1, which overflows the CDAT chunk and thus commit-graph must be disabled for the test. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	2f9bbb6d91	revision: parse parent in indegree_walk_step() In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	e30c5ee76c	commit-graph: fix regression when computing Bloom filters Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. `3d11275505` (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since `c49c82aa4c` (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_gen_cmp()' falls back to comparing commits by their date when they have equal generation number, and so since `c49c82aa4c` is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories [1]. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. [1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:17 -08:00
Derrick Stolee	a4b6d202ca	cache-tree: speed up consecutive path comparisons The previous change reduced time spent in strlen() while comparing consecutive paths in verify_cache(), but we can do better. The conditional checks the existence of a directory separator at the correct location, but only after doing a string comparison. Swap the order to be logically equivalent but perform fewer string comparisons. To test the effect on performance, I used a repository with over three million paths in the index. I then ran the following command on repeat: git -c index.threads=1 commit --amend --allow-empty --no-edit Here are the measurements over 10 runs after a 5-run warmup: Benchmark #1: v2.30.0 Time (mean ± σ): 854.5 ms ± 18.2 ms Range (min … max): 825.0 ms … 892.8 ms Benchmark #2: Previous change Time (mean ± σ): 833.2 ms ± 10.3 ms Range (min … max): 815.8 ms … 849.7 ms Benchmark #3: This change Time (mean ± σ): 815.5 ms ± 18.1 ms Range (min … max): 795.4 ms … 849.5 ms This change is 2% faster than the previous change and 5% faster than v2.30.0. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:05:13 -08:00
René Scharfe	0b72536a0b	cache-tree: use ce_namelen() instead of strlen() Use the name length field of cache entries instead of calculating its value anew. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:05:13 -08:00
Derrick Stolee	4bdde337f4	index-format: discuss recursion of cache-tree better The end of the cache tree index extension format trails off with ellipses ever since `23fcc98` (doc: technical details about the index file format, 2011-03-01). While an intuitive reader could gather what this means, it could be better to use "and so on" instead. Really, this is only justified because I also wanted to point out that the number of subtrees in the index format is used to determine when the recursive depth-first-search stack should be "popped." This should help to add clarity to the format. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:59 -08:00
Derrick Stolee	22ad8600c1	index-format: update preamble to cache tree extension I had difficulty in my efforts to learn about the cache tree extension based on the documentation and code because I had an incorrect assumption about how it behaved. This might be due to some ambiguity in the documentation, so this change modifies the beginning of the cache tree format by expanding the description of the feature. My hope is that this documentation clarifies a few things: 1. There is an in-memory recursive tree structure that is constructed from the extension data. This structure has a few differences, such as where the name is stored. 2. What does it mean for an entry to be invalid? 3. When exactly are "new" trees created? Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:46 -08:00
Derrick Stolee	845d15d4d0	index-format: use 'cache tree' over 'cached tree' The index has a "cache tree" extension. This corresponds to a significant API implemented in cache-tree.[ch]. However, there are a few places that refer to this erroneously as "cached tree". These are rare, but notably the index-format.txt file itself makes this error. The only other reference is in t7104-reset-hard.sh. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:38 -08:00
Derrick Stolee	0e5c950267	cache-tree: trace regions for prime_cache_tree Commands such as "git reset --hard" rebuild the in-memory representation of the cache tree index extension by parsing tree objects starting at a known root tree. The performance of this operation can vary widely depending on the width and depth of the repository's working directory structure. Measure the time in this operation using trace2 regions in prime_cache_tree(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:32 -08:00
Derrick Stolee	4c3e18723c	cache-tree: trace regions for I/O As we write or read the cache tree index extension, it can be good to isolate how much of the file I/O time is spent constructing this in-memory tree from the existing index or writing it out again to the new index file. Use trace2 regions to indicate that we are spending time on this operation. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:21 -08:00
Junio C Hamano	66e871b664	The third batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 21:48:47 -08:00
Junio C Hamano	49656f9445	Merge branch 'jc/macos-install-dependencies-fix' Fix for procedure to building CI test environment for mac. * jc/macos-install-dependencies-fix: ci/install-depends: attempt to fix "brew cask" stuff	2021-01-15 21:48:47 -08:00
Junio C Hamano	8782bfbf01	Merge branch 'tb/local-clone-race-doc' Doc update. * tb/local-clone-race-doc: Documentation/git-clone.txt: document race with --local	2021-01-15 21:48:47 -08:00
Junio C Hamano	644d85e751	Merge branch 'bc/doc-status-short' Doc update. * bc/doc-status-short: docs: rephrase and clarify the git status --short format	2021-01-15 21:48:47 -08:00
Junio C Hamano	453e149c8a	Merge branch 'dl/p4-encode-after-kw-expansion' Text encoding fix for "git p4". * dl/p4-encode-after-kw-expansion: git-p4: fix syncing file types with pattern	2021-01-15 21:48:47 -08:00
Junio C Hamano	cf2870adda	Merge branch 'ab/gettext-charset-comment-fix' Comments update. * ab/gettext-charset-comment-fix: gettext.c: remove/reword a mostly-useless comment Makefile: remove a warning about old GETTEXT_POISON flag	2021-01-15 21:48:46 -08:00
Junio C Hamano	eecc5f0775	Merge branch 'ug/doc-lose-dircache' Doc update. * ug/doc-lose-dircache: doc: remove "directory cache" from man pages	2021-01-15 21:48:46 -08:00
Junio C Hamano	d9e1cd555d	Merge branch 'ad/t4129-setfacl-target-fix' Test fix. * ad/t4129-setfacl-target-fix: t4129: fix setfacl-related permissions failure	2021-01-15 21:48:46 -08:00
Junio C Hamano	2b8cef2307	Merge branch 'jk/t5516-deflake' Test fix. * jk/t5516-deflake: t5516: loosen "not our ref" error check	2021-01-15 21:48:46 -08:00
Junio C Hamano	788f488b33	Merge branch 'vv/send-email-with-less-secure-apps-access' Doc update. * vv/send-email-with-less-secure-apps-access: git-send-email.txt: mention less secure app access with Gmail	2021-01-15 21:48:46 -08:00
Junio C Hamano	073552d7ae	Merge branch 'pb/mergetool-tool-help-fix' Fix 2.29 regression where "git mergetool --tool-help" fails to list all the available tools. * pb/mergetool-tool-help-fix: mergetool--lib: fix '--tool-help' to correctly show available tools	2021-01-15 21:48:46 -08:00
Junio C Hamano	aa08688362	Merge branch 'ds/for-each-repo-noopfix' "git for-each-repo --config=<var> <cmd>" should not run <cmd> for any repository when the configuration variable <var> is not defined even once. * ds/for-each-repo-noopfix: for-each-repo: do nothing on empty config	2021-01-15 21:48:46 -08:00
Junio C Hamano	6a393f36d9	Merge branch 'jc/sign-off' Doc update. * jc/sign-off: SubmittingPatches: tighten wording on "sign-off" procedure	2021-01-15 21:48:45 -08:00
Junio C Hamano	8dbabb31df	Merge branch 'mt/t4129-with-setgid-dir' Some tests expect that "ls -l" output has either '-' or 'x' for group executable bit, but setgid bit can be inherited from parent directory and make these fields 'S' or 's' instead, causing test failures. * mt/t4129-with-setgid-dir: t4129: don't fail if setgid is set in the test directory	2021-01-15 21:48:45 -08:00
Junio C Hamano	b2ace18759	Merge branch 'ds/maintenance-part-4' Follow-up on the "maintenance part-3" which introduced scheduled maintenance tasks to support platforms whose native scheduling methods are not 'cron'. * ds/maintenance-part-4: maintenance: use Windows scheduled tasks maintenance: use launchctl on macOS maintenance: include 'cron' details in docs maintenance: extract platform-specific scheduling	2021-01-15 21:48:45 -08:00
Junio C Hamano	4151fdb1c7	The second batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 15:20:30 -08:00
Junio C Hamano	f9fb9063fd	Merge branch 'fc/completion-aliases-support' Bash completion (in contrib/) update to make it easier for end-users to add completion for their custom "git" subcommands. * fc/completion-aliases-support: completion: add proper public __git_complete test: completion: add tests for __git_complete completion: bash: improve function detection completion: bash: add __git_have_func helper	2021-01-15 15:20:30 -08:00
Junio C Hamano	62fb47a4d3	Merge branch 'en/stash-apply-sparse-checkout' "git stash" did not work well in a sparsely checked out working tree. * en/stash-apply-sparse-checkout: stash: fix stash application in sparse-checkouts stash: remove unnecessary process forking t7012: add a testcase demonstrating stash apply bugs in sparse checkouts	2021-01-15 15:20:29 -08:00
Junio C Hamano	1ee70a916d	Merge branch 'ar/t6016-modernise' Test update. * ar/t6016-modernise: t6016: move to lib-log-graph.sh framework	2021-01-15 15:20:29 -08:00
Junio C Hamano	2ce8de6bf9	Merge branch 'zh/arg-help-format' Clean up option descriptions in "git cmd --help". * zh/arg-help-format: builtin/*: update usage format parse-options: format argh like error messages	2021-01-15 15:20:29 -08:00
Junio C Hamano	7bfa022993	Merge branch 'nk/perf-fsmonitor-cleanup' Test fix. * nk/perf-fsmonitor-cleanup: p7519: allow running without watchman prereq	2021-01-15 15:20:29 -08:00
Junio C Hamano	02feca721e	Merge branch 'ds/trace2-topo-walk' The topological walk codepath is covered by new trace2 stats. * ds/trace2-topo-walk: revision: trace topo-walk statistics	2021-01-15 15:20:29 -08:00
Junio C Hamano	df26861c56	Merge branch 'rs/rebase-commit-validation' Diagnose command line error of "git rebase" early. * rs/rebase-commit-validation: rebase: verify commit parameter	2021-01-15 15:20:29 -08:00
Junio C Hamano	8b327f1784	Merge branch 'ma/sha1-is-a-hash' Retire more names with "sha1" in it. * ma/sha1-is-a-hash: hash-lookup: rename from sha1-lookup sha1-lookup: rename `sha1_pos()` as `hash_pos()` object-file.c: rename from sha1-file.c object-name.c: rename from sha1-name.c	2021-01-15 15:20:29 -08:00
Junio C Hamano	16a8055dae	Merge branch 'ma/doc-pack-format-varint-for-sizes' Doc update. * ma/doc-pack-format-varint-for-sizes: pack-format.txt: document sizes at start of delta data	2021-01-15 15:20:29 -08:00
Junio C Hamano	a11571bb7f	Merge branch 'ma/t1300-cleanup' Code clean-up. * ma/t1300-cleanup: t1300: don't needlessly work with `core.foo` configs t1300: remove duplicate test for `--file no-such-file` t1300: remove duplicate test for `--file ../foo`	2021-01-15 15:20:28 -08:00
Junio C Hamano	40876260ef	Merge branch 'pb/doc-modules-git-work-tree-typofix' Doc fix. * pb/doc-modules-git-work-tree-typofix: gitmodules.txt: fix 'GIT_WORK_TREE' variable name	2021-01-15 15:20:28 -08:00
Junio C Hamano	b17eb5b4e4	Merge branch 'ta/doc-typofix' Doc fix. * ta/doc-typofix: doc: fix some typos	2021-01-15 15:20:28 -08:00
Junio C Hamano	9ba366f12b	Merge branch 'bc/rev-parse-path-format' "git rev-parse" can be explicitly told to give output as absolute or relative path with the `--path-format=(absolute\|relative)` option. * bc/rev-parse-path-format: rev-parse: add option for absolute or relative path formatting abspath: add a function to resolve paths with missing components	2021-01-15 15:20:28 -08:00
Junio C Hamano	6dbbae17d9	Merge branch 'ew/decline-core-abbrev' The configuration variable 'core.abbrev' can be set to 'no' to force no abbreviation regardless of the hash algorithm. * ew/decline-core-abbrev: core.abbrev=no disables abbreviations	2021-01-15 15:20:28 -08:00
Patrick Steinhardt	d8d77153ea	config: allow specifying config entries via envvar pairs While we currently have the `GIT_CONFIG_PARAMETERS` environment variable which can be used to pass runtime configuration data to git processes, it's an internal implementation detail and not supposed to be used by end users. Next to being for internal use only, this way of passing config entries has a major downside: the config keys need to be parsed as they contain both key and value in a single variable. As such, it is left to the user to escape any potentially harmful characters in the value, which is quite hard to do if values are controlled by a third party. This commit thus adds a new way of adding config entries via the environment which gets rid of this shortcoming. If the user passes the `GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each `i` in `[0,n)`. While the same can be achieved with `git -c <name>=<value>`, one may wish to not do so for potentially sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 13:03:45 -08:00
Patrick Steinhardt	b9d147fb15	environment: make `getenv_safe()` a public function The `getenv_safe()` helper function helps to safely retrieve multiple environment values without the need to depend on platform-specific behaviour for the return value's lifetime. We'll make use of this function in a following patch, so let's make it available by making it non-static and adding a declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 13:03:45 -08:00
Patrick Steinhardt	1ff21c05ba	config: store "git -c" variables using more robust format The previous commit added a new format for $GIT_CONFIG_PARAMETERS which is able to robustly handle subsections with "=" in them. Let's start writing the new format. Unfortunately, this does much less than you'd hope, because "git -c" itself has the same ambiguity problem! But it's still worth doing: - we've now pushed the problem from the inter-process communication into the "-c" command-line parser. This would free us up to later add an unambiguous format there (e.g., separate arguments like "git --config key value", etc). - for --config-env, the parser already disallows "=" in the environment variable name. So: git --config-env section.with=equals.key=ENVVAR will robustly set section.with=equals.key to the contents of $ENVVAR. The new test shows the improvement for --config-env. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 13:03:18 -08:00
Jeff King	f9dbb64fad	config: parse more robust format in GIT_CONFIG_PARAMETERS When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote each one as a single unit, like: 'section.one=value1' 'section.two=value2' On the reading side, we de-quote to get the individual strings, and then parse them by splitting on the first "=" we find. This format is ambiguous, because an "=" may appear in a subsection. So the config represented in a file by both: [section "subsection=with=equals"] key = value and: [section] subsection = with=equals.key=value ends up in this flattened format like: 'section.subsection=with=equals.key=value' and we can't tell which was desired. We have traditionally resolved this by taking the first "=" we see starting from the left, meaning that we allowed arbitrary content in the value, but not in the subsection. Let's make our environment format a bit more robust by separately quoting the key and value. That turns those examples into: 'section.subsection=with=equals.key'='value' and: 'section.subsection'='with=equals.key=value' respectively, and we can tell the difference between them. We can detect which format is in use for any given element of the list based on the presence of the unquoted "=". That means we can continue to allow the old format to work to support any callers which manually used the old format, and we can even intermingle the two formats. The old format wasn't documented, and nobody was supposed to be using it. But it's likely that such callers exist in the wild, so it's nice if we can avoid breaking them. Likewise, it may be possible to trigger an older version of "git -c" that runs a script that calls into a newer version of "git -c"; that new version would see the intermingled format. This does create one complication, which is that the obvious format in the new scheme for [section] some-bool is: 'section.some-bool' with no equals. We'd mistake that for an old-style variable. And it even has the same meaning in the old style, but: [section "with=equals"] some-bool does not. It would be: 'section.with=equals=some-bool' which we'd take to mean: [section] with = equals=some-bool in the old, ambiguous style. Likewise, we can't use: 'section.some-bool'='' because that's ambiguous with an actual empty string. Instead, we'll again use the shell-quoting to give us a hint, and use: 'section.some-bool'= to show that we have no value. Note that this commit just expands the reading side. We'll start writing the new format via "git -c" in a future patch. In the meantime, the existing "git -c" tests will make sure we didn't break reading the old format. But we'll also add some explicit coverage of the two formats to make sure we continue to handle the old one after we move the writing side over. And one final note: since we're now using the shell-quoting as a semantically meaningful hint, this closes the door to us ever allowing arbitrary shell quoting, like: 'a'shell'would'be'ok'with'this'.key=value But we have never supported that (only what sq_quote() would produce), and we are probably better off keeping things simple, robust, and backwards-compatible, than trying to make it easier for humans. We'll continue not to advertise the format of the variable to users, and instead keep "git -c" as the recommended mechanism for setting config (even if we are trying to be kind not to break users who may be relying on the current undocumented format). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 13:03:18 -08:00
Junio C Hamano	2d02bc91c0	t4203: make blame output massaging more robust In the "git blame --porcelain" output, lines that ends with three integers may not be the line that shows a commit object with line numbers and block length (the contents from the blamed file or the summary field can have a line that happens to match). Also, the names of the author may have more than three SP separated tokens ("git blame -L242,+1 `cf6de18aab` Documentation/SubmittingPatches" gives an example). The existing "grep -E \| cut" pipeline is a bit too loose on these two points. While they can be assumed on the test data, it is not so hard to use the right pattern from the documented format, so let's do so. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 21:54:52 -08:00
Philippe Blain	97f4b4c4e7	mailmap doc: use correct environment variable 'GIT_WORK_TREE' gitmailmap(5) uses 'GIT_WORK_DIR' to refer to the root of the repository, but this environment variable does not exist. Use the correct spelling for that variable, 'GIT_WORK_TREE'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 21:54:06 -08:00
Jeff King	779412b9d9	for_each_object_in_pack(): clarify pack vs index ordering We may return objects in one of two orders: how they appear in the .idx (sorted by object id) or how they appear in the packfile itself. To further complicate matters, we have two ordering variables, "i" and "pos", and it is not clear to which order they apply. Let's clarify this by using an unambiguous name where possible, and leaving a comment for the variable that does double-duty. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 18:22:27 -08:00
Denton Liu	afa80f534b	t4203: stop losing return codes of git commands In a pipe, only the return code of the last command is used. Thus, all other commands will have their return codes masked. Rewrite pipes so that there are no git commands upstream so that their failure is reported. Signed-off-by: Denton Liu <liu.denton@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 18:21:21 -08:00
Denton Liu	f9f30a0310	test-lib-functions.sh: fix usage for test_commit() The usage comment for test_commit() shows that the --author option should be given as `--author=<author>`. However, this is incorrect as it only works when given as `--author <author>`. Correct this erroneous text. Also, for the sake of correctness, fix the description as well since we invoke `git commit` with `--author <author>`, not `--author=<author>`. Signed-off-by: Denton Liu <liu.denton@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 18:21:03 -08:00
Christian Couder	7c99bc23fc	pack-write: die on error in write_promisor_file() write_promisor_file() already uses xfopen(), so it would die if the file cannot be opened for writing. To be consistent with this behavior and not overlook issues, let's also die if there are errors when we are actually writing to the file. Suggested-by: Jeff King <peff@peff.net> Suggested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-14 17:02:22 -08:00
Junio C Hamano	12aa5552a9	Merge branch 'en/ort-conflict-handling' into en/merge-ort-perf * en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge	2021-01-14 12:41:54 -08:00
Junio C Hamano	cafc587a1d	Merge branch 'en/diffcore-rename' into en/merge-ort-perf * en/diffcore-rename: diffcore-rename: remove unnecessary duplicate entry checks diffcore-rename: accelerate rename_dst setup diffcore-rename: simplify and accelerate register_rename_src() t4058: explore duplicate tree entry handling in a bit more detail t4058: add more tests and documentation for duplicate tree entry handling diffcore-rename: reduce jumpiness in progress counters diffcore-rename: simplify limit check diffcore-rename: avoid usage of global in too_many_rename_candidates() diffcore-rename: rename num_create to num_destinations	2021-01-14 12:41:45 -08:00
Taylor Blau	e5dcd78418	pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' To prepare for on-disk reverse indexes, remove a spot in 'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct packed_git'. Even though this use of the revindex pointer is within pack-revindex.c, this clean up is still worth doing. Since the 'revindex' pointer will be NULL when reading from an on-disk reverse index (instead the 'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this call-site would have to include a conditional to lookup the offset for position 'mi' each iteration through the search. So instead of open-coding 'pack_pos_to_offset()', call it directly from within 'offset_to_pack_pos()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:48 -08:00
Taylor Blau	d5bc7c60c7	pack-revindex: hide the definition of 'revindex_entry' Now that all spots outside of pack-revindex.c that reference 'struct revindex_entry' directly have been removed, it is safe to hide the implementation by moving it from pack-revindex.h to pack-revindex.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:48 -08:00
Taylor Blau	8389855a9b	pack-revindex: remove unused 'find_revindex_position()' Now that all 'find_revindex_position()' callers have been removed (and converted to the more descriptive 'offset_to_pack_pos()'), it is almost safe to get rid of 'find_revindex_position()' entirely. Almost, except for the fact that 'offset_to_pack_pos()' calls 'find_revindex_position()'. Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and then remove 'find_revindex_position()' entirely. This is a straightforward refactoring with one minor snag. 'offset_to_pack_pos()' used to load the index before calling 'find_revindex_position()'. That means that by the time 'find_revindex_position()' starts executing, 'p->num_objects' can be safely read. After inlining, be careful to not read 'p->num_objects' until _after_ 'load_pack_revindex()' (which loads the index as a side-effect) has been called. Another small fix that is included is converting the upper- and lower-bounds to be unsigned's instead of ints. This dates back to `92e5c77c37` (revindex: export new APIs, 2013-10-24)--ironically, the last time we introduced new APIs here--but this unifies the types. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:48 -08:00
Taylor Blau	1c3855f33b	pack-revindex: remove unused 'find_pack_revindex()' Now that no callers of 'find_pack_revindex()' remain, remove the function's declaration and implementation entirely. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	2891b434ac	builtin/gc.c: guess the size of the revindex 'estimate_repack_memory()' takes into account the amount of memory required to load the reverse index in memory by multiplying the assumed number of objects by the size of the 'revindex_entry' struct. Prepare for hiding the definition of 'struct revindex_entry' by removing a 'sizeof()' of that type from outside of pack-revindex.c. Instead, guess that one off_t and one uint32_t are required per object. Strictly speaking, this is a worse guess than asking for 'sizeof(struct revindex_entry)' directly, since the true size of this struct is 16 bytes with padding on the end of the struct in order to align the offset field. But, this is an approximation anyway, and it does remove a use of the 'struct revindex_entry' from outside of pack-revindex internals. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	b130aef65e	for_each_object_in_pack(): convert to new revindex API Avoid looking at the 'revindex' pointer directly and instead call 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	0a7e3642bc	unpack_entry(): convert to new revindex API Remove direct manipulation of the 'struct revindex_entry' type as well as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual clean-up is performed (replacing '->nr' with calls to 'pack_pos_to_index()' and so on). Add an additional check to make sure that 'obj_offset()' points at a valid object. In the case this check is violated, we cannot call 'mark_bad_packed_object()' because we don't know the OID. At the top of the call stack is do_oid_object_info_extended() (via packed_object_info()), which does mark the object. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	fc150caf67	packed_object_info(): convert to new revindex API Convert another call of 'find_pack_revindex()' to its replacement 'pack_pos_to_offset()'. Likewise: - Avoid manipulating `struct packed_git`'s `revindex` pointer directly by removing the pointer-as-array indexing. - Add an additional guard to check that the offset 'obj_offset()' points to a real object. This should be the case with well-behaved callers to 'packed_object_info()', but isn't guarenteed. Other blocks that fill in various other values from the 'struct object_info' request handle bad inputs by setting the type to 'OBJ_BAD' and jumping to 'out'. Do the same when given a bad offset here. The previous code would have segfaulted when given a bad 'obj_offset' value, since 'find_pack_revindex()' would return 'NULL', and then the line that fills 'oi->disk_sizep' would try to access 'NULL[1]' with a stride of 16 bytes (the width of 'struct revindex_entry)'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	3a3f54dd0a	retry_bad_packed_offset(): convert to new revindex API Perform exactly the same conversion as in the previous commit to another caller within 'packfile.c'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:47 -08:00
Taylor Blau	45bef5c064	get_delta_base_oid(): convert to new revindex API Replace direct accesses to the 'struct revindex' type with a call to 'pack_pos_to_index()'. Likewise drop the old-style 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()' (while continuing to perform the same error checking). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:46 -08:00
Taylor Blau	78232bf65d	rebuild_existing_bitmaps(): convert to new revindex API Remove another instance of looking at the revindex directly by instead calling 'pack_pos_to_index()'. Unlike other patches, this caller only cares about the index position of each object in the loop. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:46 -08:00
Taylor Blau	011f3fd5cd	try_partial_reuse(): convert to new revindex API Remove another instance of direct revindex manipulation by calling 'pack_pos_to_offset()' instead (the caller here does not care about the index position of the object at position 'pos'). Note that we cannot just use the existing "offset" variable to store the value we get from pack_pos_to_offset(). It is incremented by unpack_object_header(), but we later need the original value. Since we'll no longer have revindex->offset to read it from, we'll store that in a separate variable ("header" since it points to the entry's header bytes). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:46 -08:00
Taylor Blau	a78a90324d	get_size_by_pos(): convert to new revindex API Remove another caller that holds onto a 'struct revindex_entry' by replacing the direct indexing with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:46 -08:00
Taylor Blau	cf98f2e8e0	show_objects_for_type(): convert to new revindex API Avoid storing the revindex entry directly, since this structure will soon be removed from the public interface. Instead, store the offset and index position by calling 'pack_pos_to_offset()' and 'pack_pos_to_index()', respectively. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:46 -08:00
Taylor Blau	57665086af	bitmap_position_packfile(): convert to new revindex API Replace find_revindex_position() with its counterpart in the new API, offset_to_pack_pos(). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	eb3fd99efd	check_object(): convert to new revindex API Replace direct accesses to the revindex with calls to 'offset_to_pack_pos()' and 'pack_pos_to_index()'. Since this caller already had some error checking (it can jump to the 'give_up' label if it encounters an error), we can easily check whether or not the provided offset points to an object in the given pack. This error checking existed prior to this patch, too, since the caller checks whether the return value from 'find_pack_revindex()' was NULL or not. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	6a5c10c45f	write_reused_pack_verbatim(): convert to new revindex API Replace a direct access to the revindex array with 'pack_pos_to_offset()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	66cbd3e2fb	write_reused_pack_one(): convert to new revindex API Replace direct revindex accesses with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	952fc6870d	write_reuse_object(): convert to new revindex API First replace 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make its way through until 'write_reuse_object()' from causing a bad memory read (if 'revidx' is 'NULL') Next, replace a direct access of '->nr' with the wrapper function 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	f33fb6e419	pack-revindex: introduce a new API In the next several patches, we will prepare for loading a reverse index either in memory (mapping the inverse of the .idx's contents in-core), or directly from a yet-to-be-introduced on-disk format. To prepare for that, we'll introduce an API that avoids the caller explicitly indexing the revindex pointer in the packed_git structure. There are four ways to interact with the reverse index. Accordingly, four functions will be exported from 'pack-revindex.h' by the time that the existing API is removed. A caller may: 1. Load the pack's reverse index. This involves opening up the index, generating an array, and then sorting it. Since opening the index can fail, this function ('load_pack_revindex()') returns an int. Accordingly, it takes only a single argument: the 'struct packed_git' the caller wants to build a reverse index for. This function is well-suited for both the current and new API. Callers will have to continue to open the reverse index explicitly, but this function will eventually learn how to detect and load a reverse index from the on-disk format, if one exists. Otherwise, it will fallback to generating one in memory from scratch. 2. Convert a pack position into an offset. This operation is now called `pack_pos_to_offset()`. It takes a pack and a position, and returns the corresponding off_t. Any error simply calls BUG(), since the callers are not well-suited to handle a failure and keep going. 3. Convert a pack position into an index position. Same as above; this takes a pack and a position, and returns a uint32_t. This operation is known as `pack_pos_to_index()`. The same thinking about error conditions applies here as well. 4. Find the pack position for a given offset. This operation is now known as `offset_to_pack_pos()`. It takes a pack, an offset, and a pointer to a uint32_t where the position is written, if an object exists at that offset. Otherwise, -1 is returned to indicate failure. Unlike some of the callers that used to access '->offset' and '->nr' directly, the error checking around this call is somewhat more robust. This is important since callers should always pass an offset which points at the boundary of two objects. The API, unlike direct access, enforces that that is the case. This will become important in a subsequent patch where a caller which does not but could check the return value treats the signed `-1` from `find_revindex_position()` as an index into the 'revindex' array. Two design warts are carried over into the new API: - Asking for the index position of an out-of-bounds object will result in a BUG() (since no such object exists), but asking for the offset of the non-existent object at the end of the pack returns the total size of the pack. This makes it convenient for callers who always want to take the difference of two adjacent object's offsets (to compute the on-disk size) but don't want to worry about boundaries at the end of the pack. - offset_to_pack_pos() lazily loads the reverse index, but pack_pos_to_index() doesn't (callers of the former are well-suited to handle errors, but callers of the latter are not). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:44 -08:00
Ævar Arnfjörð Bjarmason	0d28d3cf33	CoC: update to version 2.0 + local changes Update the CoC added in `5cdf2301` (add a Code of Conduct document, 2019-09-24 from version 1.4 to version 2.0. This is the version found at [1] with the following minor changes: - We preserve the change to the CoC in `3f9ef874a7` (CODE_OF_CONDUCT: mention individual project-leader emails, 2019-09-26) - We preserve the custom intro added in `5cdf2301d4` (add a Code of Conduct document, 2019-09-24) This change intentionally preserves a warning emitted on "git diff --check". It's better to make it easily diff-able with upstream than to fix whitespace changes in our version while we're at it. 1. https://www.contributor-covenant.org/version/2/0/code_of_conduct/code_of_conduct.md Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Christian Couder <chriscool@tuxfamily.org> Acked-by: Derrick Stolee <dstolee@microsoft.com> Acked-by: Elijah Newren <newren@gmail.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: brian m. carlson <sandals@crustytoothpaste.net> Acked-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylor.com> Acked-by: Jonathan Tan <jonathantanmy@google.com> Acked-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 17:45:04 -08:00
Christian Couder	33add2ad7d	fetch-pack: refactor writing promisor file Let's replace the 2 different pieces of code that write a promisor file in 'builtin/repack.c' and 'fetch-pack.c' with a new function called 'write_promisor_file()' in 'pack-write.c' and 'pack.h'. This might also help us in the future, if we want to put back the ref names and associated hashes that were in the promisor files we are repacking in 'builtin/repack.c' as suggested by a NEEDSWORK comment just above the code we are refactoring. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 16:01:07 -08:00
Christian Couder	9d7fa3be31	fetch-pack: rename helper to create_promisor_file() As we are going to refactor the code that actually writes the promisor file into a separate function in a following commit, let's rename the current write_promisor_file() function to create_promisor_file(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 16:01:07 -08:00
Ævar Arnfjörð Bjarmason	4e168333a8	shortlog: remove unused(?) "repo-abbrev" feature Remove support for the magical "repo-abbrev" comment in .mailmap files. This was added to .mailmap parsing in [1], as a generalized feature of the git-shortlog Perl script added earlier in [2]. There was no documentation or tests for this feature, and I don't think it's used in practice anymore. What it did was to allow you to specify a single string to be search-replaced with "/.../" in the .mailmap file. E.g. for linux.git's current .mailmap: git archive --remote=git@gitlab.com:linux-kernel/linux.git \ HEAD -- .mailmap \| grep -a repo-abbrev # repo-abbrev: /pub/scm/linux/kernel/git/ Then when running e.g.: git shortlog --merges --author=Linus -1 v5.10-rc7..v5.10 \| grep Merge We'd emit (the [...] is mine): Merge tag [...]git://git.kernel.org/.../tip/tip But will now emit: Merge tag [...]git.kernel.org/pub/scm/linux/kernel/git/tip/tip I think at this point this is just a historical artifact we can get rid of. It was initially meant for Linus's own use when we integrated the Perl script[2], but since then it seems he's stopped using it. Digging through Linus's release announcements on the LKML[3] the last release I can find that made use of this output is Linux 2.6.25-rc6 back in March 2008[4]. Later on Linus started using --no-merges[5], and nowadays seems to prefer some custom not-quite-shortlog format of merges from lieutenants[6]. You will still see it on linux.git if you run "git shortlog" manually yourself with --merges, with this removed you can still get the same output with: git log --pretty=fuller v5.10-rc7..v5.10 \| sed 's!/pub/scm/linux/kernel/git/!/.../!g' \| git shortlog Arguably we should do the same for the search-replacing of "[PATCH]" at the beginning with "". That seems to be another relic of a bygone era when linux.git patches would have their E-Mail subject lines applied as-is by "git am" or whatever. But we documented that feature in "git-shortlog(1)", and it seems more widely applicable than something purely kernel-specific. 1. `7595e2ee6e` (git-shortlog: make common repository prefix configurable with .mailmap, 2006-11-25) 2. `fa375c7f1b` (Add git-shortlog perl script, 2005-06-04) 3. https://lore.kernel.org/lkml/ 4. https://lore.kernel.org/lkml/alpine.LFD.1.00.0803161651350.3020@woody.linux-foundation.org/ 5. https://lore.kernel.org/lkml/BANLkTinrbh7Xi27an3uY7pDWrNKhJRYmEA@mail.gmail.com/ 6. https://lore.kernel.org/lkml/CAHk-=wg1+kf1AVzXA-RQX0zjM6t9J2Kay9xyuNqcFHWV-y5ZYw@mail.gmail.com/ Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	238803cb40	mailmap doc + tests: document and test for case-insensitivity Add documentation and more tests for case-insensitivity. The existing test only matched on the E-Mail part, but as shown here we also match the name with strcasecmp(). This behavior was last discussed on the mailing list in the thread starting at [1]. It seems we're keeping it like this, so let's document it. 1. https://lore.kernel.org/git/87czykvg19.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	34986b773a	mailmap tests: add tests for empty "<>" syntax Add tests for mailmap's handling of "<>", which is allowed on the RHS, but not the LHS of a "<LHS> <RHS>" pair. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	9e2a14a889	mailmap tests: add tests for whitespace syntax Add tests for mailmap's handling of whitespace, i.e. how it trims space within "<>" and around author names. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	9b391b09a0	mailmap tests: add a test for comment syntax Add a test for mailmap comment syntax. As noted in [1] there was no test coverage for this. Let's make sure a future change doesn't break it. 1. https://lore.kernel.org/git/CAN0heSoKYWXqskCR=GPreSHc6twCSo1345WTmiPdrR57XSShhA@mail.gmail.com/ Reported-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	05b5ff219c	mailmap doc + tests: add better examples & test them Change the mailmap documentation added in `0925ce4d49` (Add map_user() and clear_mailmap() to mailmap, 2009-02-08) to continue discussing the Jane/Joe example. I think this makes things a lot less confusing as we're building up more complex examples using one set of data which covers all the things we'd like to discuss. Also add tests to assert that what our documentation says is what's actually happening. This is mostly (or entirely) covered by existing tests which I'm not deleting, but having these tests for the synopsis makes it easier to follow-along while reading the tests & docs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:42 -08:00
Ævar Arnfjörð Bjarmason	f5d79bf7dd	tests: refactor a few tests to use "test_commit --append" Refactor a few more tests to use the new "--append" option to "test_commit". I added it for use in the mailmap tests, but this demonstrates how useful it is in general. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	3373518cc8	test-lib functions: add an --append option to test_commit Add an --append option to test_commit to append <contents> to the <file> we're writing to. This simplifies a lot of test setup, as shown in some of the tests being changed here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	999cfc4f45	test-lib functions: add --author support to test_commit Add support for --author to "test_commit". This will simplify some current and future tests, one of those is being changed here. Let's also line-wrap the "git commit" command invocation to make diffs that add subsequent options easier to add, as they'll only need to add a new option line. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	76b8b8d05c	test-lib functions: document arguments to test_commit The --notick argument was added in [1] and was followed by --signoff in [2], but neither of these commits added any documentation for these options. When -C was added in [3] a comment was added to document it, but not the other options. Let's document all of these options. 1. `44b85e89d7` (t7003: add test to filter a branch with a commit at epoch, 2012-07-12), 2. `5ed75e2a3f` (cherry-pick: don't forget -s on failure, 2012-09-14). 3. `6f94351b0a` (test-lib-functions.sh: teach test_commit -C <dir>, 2016-12-08) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	f21426e189	test-lib functions: expand "test_commit" comment template Expand the comment template for "test_commit" to match that of "test_commit_bulk" added in `b1c36cb849` (test-lib: introduce test_commit_bulk, 2019-07-02). It has several undocumented options, which won't all fit on one line. Follow-up commit(s) will document them. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	56ac194e1d	mailmap: test for silent exiting on missing file/blob That we silently ignore missing mailmap.file or mailmap.blob values is intentional. See `938a60d64f` (mailmap: clean up read_mailmap error handling, 2012-12-12). However, nothing tested for this. Let's do that by checking that stderr is empty in those cases. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	c1fe7fd7e3	mailmap tests: get rid of overly complex blame fuzzing Change a test that used a custom fuzzing function since `bfdfa3d414` (t4203 (mailmap): stop hardcoding commit ids and dates, 2010-10-15) to just use the "blame --porcelain" output instead. We could use the same pattern as `0ba9c9a0fb` (t8008: rely on rev-parse'd HEAD instead of sha1 value, 2017-07-26) does to do this, but there wouldn't be any point. We're not trying to test "blame" output here in general, just that "blame" pays attention to the mailmap. So it's sufficient to get the blamed line(s) and authors from the output, which is much easier with the "--porcelain" option. It would still be possible for there to be a bug in "blame" such that it uses the mailmap for its "--porcelain" output, but not the regular output. Let's test for that simply by checking if specifying the mailmap changes the output. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:41 -08:00
Ævar Arnfjörð Bjarmason	400d160e39	mailmap tests: add a test for "not a blob" error Add a test for one of the error conditions added in `938a60d64f` (mailmap: clean up read_mailmap error handling, 2012-12-12). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	fb3bbe4ea3	mailmap tests: remove redundant entry in test Remove a redundant line in a test added in `d20d654fe8` (Change current mailmap usage to do matching on both name and email of author/committer., 2009-02-08). This didn't conceivably test anything useful and is most likely a copy/paste error. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	1db421ab85	mailmap tests: improve --stdin tests The --stdin tests setup the "contact" file in the main setup, let's instead set it up in the test that uses it. Also refactor the first test so it's obvious that the point of it is that "check-mailmap" will spew its input as-is when given no argument. For that one we can just use the "expect" file as-is. Also add tests for how other "--stdin" cases are handled, e.g. one where we actually do a mapping. For the rest of --stdin testing we just assume we're going to get the same output. We could follow-up and make sure everything's round-tripped through both --stdin and the file/blob backends, but I don't think there's much point in that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	e9931ace4f	mailmap tests: modernize syntax & test idioms Refactor the mailmap tests to: * Setup "actual" test files in the body of "test_expect_success" * Don't have X of "test_expect_success X Y" be an unquoted string. * Not to carry over test config between tests, and instead use "test_config". * Replace various "echo" a line-at-a-time patterns with here-docs. * Change a case of "log.mailmap=False" to use the lower-case "false". Both work, but this ends up in git-config's boolean parsing and these atypical values are tested for elsewhere. Let's use the lower-case to not draw the reader's attention to this abnormality. * Remove commentary asserting that things work a given way in favor of simply testing for it, i.e. in the case of a .mailmap file outside of the repository. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	9aaeac9cf7	mailmap tests: use our preferred whitespace syntax Change these tests to use the preferred whitespace around ">", "<<-EOF" etc. This is an initial step in larger and more meaningful refactoring of the file, which makes a subsequent commit easier to read. I'm not changing the whitespace of "echo <str> > file" patterns to "echo <str> >file" because all of those will be changed to here-docs in a subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	fcafb75382	mailmap doc: start by mentioning the comment syntax Mentioning the comment syntax and blank line support first is in line with how "git help config" describes its format. See `b8936cf060` (config.txt grammar, typo, and asciidoc fixes, 2006-06-08) for the paragraph I'm copying & amending from its documentation. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	6646cca892	check-mailmap doc: note config options Add a passing mention of the mailmap.file and mailmap.blob configuration options. Before this addition a reader of the "check-mailmap" manpage would have no idea that a custom map could be specified, unless they'd happen to e.g. come across it in the "config" manpage first. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	4f2ee994f3	mailmap doc: quote config variables `like.this` Quote the mailmap.file and mailmap.blob configuration variables as `mailmap.file` and `mailmap.blob`, and link to git-config(1). This is in line with the preferred way of doing this in the rest of our documentation. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:40 -08:00
Ævar Arnfjörð Bjarmason	42957af027	mailmap doc: create a new "gitmailmap(5)" man page Create a gitmailmap(5) page similar to how .gitmodules and .gitignore have their own pages at gitmodules(5) and gitignore(5). Now instead of "check-mailmap", "blame" and "shortlog" documentation including the description of the format we link to one canonical place. This makes things easier for readers, since in our manpage or web-based[1] output it's not clear that the "MAPPING AUTHORS" sections aren't subtly different, as opposed to just included. 1. https://git-scm.com/docs/git-check-mailmap Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 14:04:39 -08:00
Patrick Steinhardt	c7b190dabd	fetch: implement support for atomic reference updates When executing a fetch, then git will currently allocate one reference transaction per reference update and directly commit it. This means that fetches are non-atomic: even if some of the reference updates fail, others may still succeed and modify local references. This is fine in many scenarios, but this strategy has its downsides. - The view of remote references may be inconsistent and may show a bastardized state of the remote repository. - Batching together updates may improve performance in certain scenarios. While the impact probably isn't as pronounced with loose references, the upcoming reftable backend may benefit as it needs to write less files in case the update is batched. - The reference-update hook is currently being executed twice per updated reference. While this doesn't matter when there is no such hook, we have seen severe performance regressions when doing a git-fetch(1) with reference-transaction hook when the remote repository has hundreds of thousands of references. Similar to `git push --atomic`, this commit thus introduces atomic fetches. Instead of allocating one reference transaction per updated reference, it causes us to only allocate a single transaction and commit it as soon as all updates were received. If locking of any reference fails, then we abort the complete transaction and don't update any reference, which gives us an all-or-nothing fetch. Note that this may not completely fix the first of above downsides, as the consistent view also depends on the server-side. If the server doesn't have a consistent view of its own references during the reference negotiation phase, then the client would get the same inconsistent view the server has. This is a separate problem though and, if it actually exists, can be fixed at a later point. This commit also changes the way we write FETCH_HEAD in case `--atomic` is passed. Instead of writing changes as we go, we need to accumulate all changes first and only commit them at the end when we know that all reference updates succeeded. Ideally, we'd just do so via a temporary file so that we don't need to carry all updates in-memory. This isn't trivially doable though considering the `--append` mode, where we do not truncate the file but simply append to it. And given that we support concurrent processes appending to FETCH_HEAD at the same time without any loss of data, seeding the temporary file with current contents of FETCH_HEAD initially and then doing a rename wouldn't work either. So this commit implements the simple strategy of buffering all changes and appending them to the file on commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:06:15 -08:00
Patrick Steinhardt	d4c8db8f1b	fetch: allow passing a transaction to `s_update_ref()` The handling of ref updates is completely handled by `s_update_ref()`, which will manage the complete lifecycle of the reference transaction. This is fine right now given that git-fetch(1) does not support atomic fetches, so each reference gets its own transaction. It is quite inflexible though, as `s_update_ref()` only knows about a single reference update at a time, so it doesn't allow us to alter the strategy. This commit prepares `s_update_ref()` and its only caller `update_local_ref()` to allow passing an external transaction. If none is given, then the existing behaviour is triggered which creates a new transaction and directly commits it. Otherwise, if the caller provides a transaction, then we only queue the update but don't commit it. This optionally allows the caller to manage when a transaction will be committed. Given that `update_local_ref()` is always called with a `NULL` transaction for now, no change in behaviour is expected from this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:06:15 -08:00
Patrick Steinhardt	c45889f104	fetch: refactor `s_update_ref` to use common exit path The cleanup code in `s_update_ref()` is currently duplicated for both succesful and erroneous exit paths. This commit refactors the function to have a shared exit path for both cases to remove the duplication. Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:06:15 -08:00
Patrick Steinhardt	929d044575	fetch: use strbuf to format FETCH_HEAD updates This commit refactors `append_fetch_head()` to use a `struct strbuf` for formatting the update which we're about to append to the FETCH_HEAD file. While the refactoring doesn't have much of a benefit right now, it serves as a preparatory step to implement atomic fetches where we need to buffer all updates to FETCH_HEAD and only flush them out if all reference updates succeeded. No change in behaviour is expected from this commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:06:14 -08:00
Patrick Steinhardt	58a646a368	fetch: extract writing to FETCH_HEAD When performing a fetch with the default `--write-fetch-head` option, we write all updated references to FETCH_HEAD while the updates are performed. Given that updates are not performed atomically, it means that we we write to FETCH_HEAD even if some or all of the reference updates fail. Given that we simply update FETCH_HEAD ad-hoc with each reference, the logic is completely contained in `store_update_refs` and thus quite hard to extend. This can already be seen by the way we skip writing to the FETCH_HEAD: instead of having a conditional which simply skips writing, we instead open "/dev/null" and needlessly write all updates there. We are about to extend git-fetch(1) to accept an `--atomic` flag which will make the fetch an all-or-nothing operation with regards to the reference updates. This will also require us to make the updates to FETCH_HEAD an all-or-nothing operation, but as explained doing so is not easy with the current layout. This commit thus refactors the wa we write to FETCH_HEAD and pulls out the logic to open, append to, commit and close the file. While this may seem rather over-the top at first, pulling out this logic will make it a lot easier to update the code in a subsequent commit. It also allows us to easily skip writing completely in case `--no-write-fetch-head` was passed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:06:14 -08:00
Patrick Steinhardt	b342ae61b3	config: extract function to parse config pairs The function `git_config_parse_parameter` is responsible for parsing a `foo.bar=baz`-formatted configuration key, sanitizing the key and then processing it via the given callback function. Given that we're about to add a second user which is going to process keys which already has keys and values separated, this commit extracts a function `config_parse_pair` which only does the sanitization and processing part as a preparatory step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:03:18 -08:00
Jeff King	13c44953fb	quote: make sq_dequote_step() a public function We provide a function for dequoting an entire string, as well as one for handling a space-separated list of quoted strings. But there's no way for a caller to parse a string like 'foo'='bar', even though it is easy to generate one using sq_quote_buf() or similar. Let's make the single-step function available to callers outside of quote.c. Note that we do need to adjust its implementation slightly: it insists on seeing whitespace between items, and we'd like to be more flexible than that. Since it only has a single caller, we can move that check (and slurping up any extra whitespace) into that caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:03:18 -08:00
Patrick Steinhardt	ce81b1da23	config: add new way to pass config via `--config-env` While it's already possible to pass runtime configuration via `git -c <key>=<value>`, it may be undesirable to use when the value contains sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. To enable this usecase without leaking credentials, this commit introduces a new switch `--config-env=<key>=<envvar>`. Instead of directly passing a value for the given key, it instead allows the user to specify the name of an environment variable. The value of that variable will then be used as value of the key. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 12:03:18 -08:00
Jiang Xin	5bb0fd2cab	bundle: arguments can be read from stdin In order to create an incremental bundle, we need to pass many arguments to let git-bundle ignore some already packed commits. It will be more convenient to pass args via stdin. But the current implementation does not allow us to do this. This is because args are parsed twice when creating bundle. The first time for parsing args is in `compute_and_write_prerequisites()` by running `git-rev-list` command to write prerequisites in bundle file, and stdin is consumed in this step if "--stdin" option is provided for `git-bundle`. Later nothing can be read from stdin when running `setup_revisions()` in `create_bundle()`. The solution is to parse args once by removing the entire function `compute_and_write_prerequisites()` and then calling function `setup_revisions()`. In order to write prerequisites for bundle, will call `prepare_revision_walk()` and `traverse_commit_list()`. But after calling `prepare_revision_walk()`, the object array `revs.pending` is left empty, and the following steps could not work properly with the empty object array (`revs.pending`). Therefore, make a copy of `revs` to `revs_copy` for later use right after calling `setup_revisions()`. The copy of `revs_copy` is not a deep copy, it shares the same objects with `revs`. The object array of `revs` has been cleared, but objects themselves are still kept. Flags of objects may change after calling `prepare_revision_walk()`, we can use these changed flags without calling the `git rev-list` command and parsing its output like the former implementation. Also add testcases for git bundle in t6020, which read args from stdin. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-11 21:50:41 -08:00
Jiang Xin	ce1d6d9f16	bundle: lost objects when removing duplicate pendings `git rev-list` will list one commit for the following command: $ git rev-list 'main^!' <tip-commit-of-main-branch> But providing the same rev-list args to `git bundle`, fail to create a bundle file. $ git bundle create - 'main^!' # v2 git bundle -<OID> <one-line-message> fatal: Refusing to create empty bundle. This is because when removing duplicate objects in function `object_array_remove_duplicates()`, one unique pending object which has the same name is deleted by mistake. The revision arg 'main^!' in the above example is parsed by `handle_revision_arg()`, and at lease two different objects will be appended to `revs.pending`, one points to the parent commit of the "main" branch, and the other points to the tip commit of the "main" branch. These two objects have the same name "main". Only one object is left with the name "main" after calling the function `object_array_remove_duplicates()`. And what's worse, when adding boundary commits into pending list, we use one-line commit message as names, and the arbitory names may surprise git-bundle. Only comparing objects themselves (".item") is also not good enough, because user may want to create a bundle with two identical objects but with different reference names, such as: "HEAD" and "refs/heads/main". Add new function `contains_object()` which compare both the address and the name of the object. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-11 21:50:41 -08:00
Jiang Xin	9901164d81	test: add helper functions for git-bundle Move git-bundle related functions from t5510 to a library, and this lib will be shared with a new testcase t6020 which finds a known breakage of "git-bundle". Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-11 21:50:41 -08:00
Denton Liu	6436a20284	refs: allow @{n} to work with n-sized reflog This sequence works $ git checkout -b newbranch $ git commit --allow-empty -m one $ git show -s newbranch@{1} and shows the state that was immediately after the newbranch was created. But then if you do $ git reflog expire --expire=now refs/heads/newbranch $ git commit --allow=empty -m two $ git show -s newbranch@{1} you'd be scolded with fatal: log for 'newbranch' only has 1 entries While it is true that it has only 1 entry, we have enough information in that single entry that records the transition between the state in which the tip of the branch was pointing at commit 'one' to the new commit 'two' built on it, so we should be able to answer "what object newbranch was pointing at?". But we refuse to do so. Make @{0} the special case where we use the new side to look up that entry. Otherwise, look up @{n} using the old side of the (n-1)th entry of the reflog. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-11 14:13:50 -08:00
Denton Liu	95c2a71820	refs: factor out set_read_ref_cutoffs() This block of code is duplicated twice. In a future commit, it will be duplicated for a third time. Factor out the common functionality into set_read_ref_cutoffs(). In the case of read_ref_at_ent(), we are incrementing `cb->reccnt` at the beginning of the function. Move these to right before the return so that the `cb->reccnt - 1` is changed to `cb->reccnt` and it can be cleanly factored out into set_read_ref_cutoffs(). The duplication of the increment statements will be removed in a future patch. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-10 12:24:00 -08:00
SZEDER Gábor	e3f5da7e60	t7800-difftool: don't accidentally match tmp dirs In a bunch of test cases in 't7800-difftool.sh' we 'grep' for specific filenames in 'git difftool's output, and those test cases are prone to occasional failures because those filenames might be part of the name of difftool's temporary directory as well, e.g.: +git difftool --dir-diff --no-symlinks --extcmd ls v1 +grep sub output +test_line_count = 2 sub-output test_line_count: line count for sub-output != 2 /tmp/git-difftool.Ssubfq/left/: sub /tmp/git-difftool.Ssubfq/right/: sub error: last command exited with $?=1 not ok 50 - difftool --dir-diff v1 from subdirectory --no-symlinks Fix this by tightening the 'grep' patterns looking for those interesting filenames to match only lines where a filename stands on its own. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-09 13:40:32 -08:00
Elijah Newren	eb3e3e1ddf	merge-ort: collect which directories are removed in dirs_removed Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:03 -08:00
Elijah Newren	f5d9fbc2e9	merge-ort: initialize and free new directory rename data structures Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:03 -08:00
Elijah Newren	c09376d55f	merge-ort: add new data structures for directory rename detection Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:02 -08:00
Junio C Hamano	8f894b2263	Merge branch 'en/merge-ort-3' into en/ort-directory-rename * en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames	2021-01-07 15:29:49 -08:00
Junio C Hamano	72c4083ddf	The first batch in 2.31 cycle Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-06 23:33:44 -08:00
Junio C Hamano	d3aff11c3e	Merge branch 'es/perf-export-fix' Tweak unneeded recursion from a test framework helper function. * es/perf-export-fix: t/perf: avoid unnecessary test_export() recursion	2021-01-06 23:33:44 -08:00
Junio C Hamano	cf4b0714f7	Merge branch 'fc/t6030-bisect-reset-removes-auxiliary-files' A 3-year old test that was not testing anything useful has been corrected. * fc/t6030-bisect-reset-removes-auxiliary-files: test: bisect-porcelain: fix location of files	2021-01-06 23:33:44 -08:00
Junio C Hamano	8664fcb83b	Merge branch 'es/worktree-repair-both-moved' "git worktree repair" learned to deal with the case where both the repository and the worktree moved. * es/worktree-repair-both-moved: worktree: teach `repair` to fix multi-directional breakage	2021-01-06 23:33:44 -08:00
Junio C Hamano	45a177069f	Merge branch 'en/merge-ort-recursive' The ORT merge strategy learned to synthesize virtual ancestor tree by recursively merging multiple merge bases together, just like the recursive backend has done for years. * en/merge-ort-recursive: merge-ort: implement merge_incore_recursive() merge-ort: make clear_internal_opts() aware of partial clearing merge-ort: copy a few small helper functions from merge-recursive.c commit: move reverse_commit_list() from merge-recursive	2021-01-06 23:33:44 -08:00
Junio C Hamano	d3fa84d528	Merge branch 'fc/pull-merge-rebase' When a user does not tell "git pull" to use rebase or merge, the command gives a loud message telling a user to choose between rebase or merge but creates a merge anyway, forcing users who would want to rebase to redo the operation. Fix an early part of this problem by tightening the condition to give the message---there is no reason to stop or force the user to choose between rebase or merge if the history fast-forwards. * fc/pull-merge-rebase: pull: display default warning only when non-ff pull: correct condition to trigger non-ff advice pull: get rid of unnecessary global variable pull: give the advice for choosing rebase/merge much later pull: refactor fast-forward check	2021-01-06 23:33:44 -08:00
Junio C Hamano	85cf82ff01	Merge branch 'en/merge-ort-2' More "ORT" merge strategy. * en/merge-ort-2: merge-ort: add modify/delete handling and delayed output processing merge-ort: add die-not-implemented stub handle_content_merge() function merge-ort: add function grouping comments merge-ort: add a paths_to_free field to merge_options_internal merge-ort: add a path_conflict field to merge_options_internal merge-ort: add a clear_internal_opts helper merge-ort: add a few includes	2021-01-06 23:33:44 -08:00
Junio C Hamano	f9d29daba6	Merge branch 'en/merge-ort-impl' The merge backend "done right" starts to emerge. * en/merge-ort-impl: merge-ort: free data structures in merge_finalize() merge-ort: add implementation of record_conflicted_index_entries() tree: enable cmp_cache_name_compare() to be used elsewhere merge-ort: add implementation of checkout() merge-ort: basic outline for merge_switch_to_result() merge-ort: step 3 of tree writing -- handling subdirectories as we go merge-ort: step 2 of tree writing -- function to create tree object merge-ort: step 1 of tree writing -- record basenames, modes, and oids merge-ort: have process_entries operate in a defined order merge-ort: add a preliminary simple process_entries() implementation merge-ort: avoid recursing into identical trees merge-ort: record stage and auxiliary info for every path merge-ort: compute a few more useful fields for collect_merge_info merge-ort: avoid repeating fill_tree_descriptor() on the same tree merge-ort: implement a very basic collect_merge_info() merge-ort: add an err() function similar to one from merge-recursive merge-ort: use histogram diff merge-ort: port merge_start() from merge-recursive merge-ort: add some high-level algorithm structure merge-ort: setup basic internal data structures	2021-01-06 23:33:43 -08:00
Junio C Hamano	c256631065	Merge branch 'tb/pack-bitmap' Various improvements to the codepath that writes out pack bitmaps. * tb/pack-bitmap: (24 commits) pack-bitmap-write: better reuse bitmaps pack-bitmap-write: relax unique revwalk condition pack-bitmap-write: use existing bitmaps pack-bitmap: factor out 'add_commit_to_bitmap()' pack-bitmap: factor out 'bitmap_for_commit()' pack-bitmap-write: ignore BITMAP_FLAG_REUSE pack-bitmap-write: build fewer intermediate bitmaps pack-bitmap.c: check reads more aggressively when loading pack-bitmap-write: rename children to reverse_edges t5310: add branch-based checks commit: implement commit_list_contains() bitmap: implement bitmap_is_subset() pack-bitmap-write: fill bitmap with commit history pack-bitmap-write: pass ownership of intermediate bitmaps pack-bitmap-write: reimplement bitmap writing ewah: add bitmap_dup() function ewah: implement bitmap_or() ewah: make bitmap growth less aggressive ewah: factor out bitmap growth rev-list: die when --test-bitmap detects a mismatch ...	2021-01-06 23:33:43 -08:00
Junio C Hamano	b62bbd3580	Merge branch 'ab/trailers-extra-format' The "--format=%(trailers)" mechanism gets enhanced to make it easier to design output for machine consumption. * ab/trailers-extra-format: pretty format %(trailers): add a "key_value_separator" pretty format %(trailers): add a "keyonly" pretty-format %(trailers): fix broken standalone "valueonly" pretty format %(trailers) doc: avoid repetition pretty format %(trailers) test: split a long line	2021-01-06 23:33:43 -08:00
Junio C Hamano	c977ff4407	Merge branch 'pk/subsub-fetch-fix-take-2' "git fetch --recurse-submodules" fix (second attempt). * pk/subsub-fetch-fix-take-2: submodules: fix of regression on fetching of non-init subsub-repo	2021-01-06 23:33:43 -08:00
Patrick Steinhardt	b0812b6ac0	git: add `--super-prefix` to usage string When the `--super-prefix` option was implmented in `74866d7579` (git: make super-prefix option, 2016-10-07), its existence was only documented in the manpage but not in the command's own usage string. Given that the commit message didn't mention that this was done intentionally and given that it's documented in the manpage, this seems like an oversight. Add it to the usage string to fix the inconsistency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-06 22:55:06 -08:00
Ævar Arnfjörð Bjarmason	06ce79152b	mktag: add a --[no-]strict option Now that mktag has been migrated to use the fsck machinery to check its input, it makes sense to teach it to run in the equivalent of "git fsck"'s default mode. For cases where mktag is used to (re)create a tag object using data from an existing and malformed tag object, the validation may optionally have to be loosened. Teach the command to take the "--[no-]strict" option to do so. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-06 14:22:24 -08:00
Ævar Arnfjörð Bjarmason	2aa9425fbe	mktag: mark strings for translation Mark the errors mktag might emit for translation. This is a plumbing command, but the errors it emits are intended to be human-readable. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	3f390a366c	mktag: convert to parse-options Convert the "mktag" command to use parse-options.h instead of its own ad-hoc argc handling. This doesn't matter much in practice since it doesn't support any options, but removes another special-case in our codebase, and makes it easier to add options to it in the future. It does marginally improve the situation for programs that want to execute git commands in a consistent manner and e.g. always use --end-of-options. E.g. "gitaly" does that, and has a blacklist of built-ins that don't support --end-of-options. This is one less special case for it and other similar programs to support. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	9a1a3a4d4c	mktag: allow omitting the header/body \n separator Change mktag's acceptance rules to accept an empty body without an empty line after the header again. This fixes an ancient unintended dregression in "mktag". When "mktag" was introduced in `ec4465adb3` (Add "tag" objects that can be used to sign other objects., 2005-04-25) the input checks were much looser. When it was documented it `6cfec03680` (mktag: minimally update the description., 2007-06-10) it was clearly intended for this \n to be optional: The message, when [it] exists, is separated by a blank line from the header. But then in `e0aaf781f6` (mktag.c: improve verification of tagger field and tests, 2008-03-27) this was made an error, seemingly by accident. It was just a result of the general header checks, and all the tests after that patch have a trailing empty line (but did not before). Let's allow this again, and tweak the test semantics changed in `e0aaf781f6` to remove the redundant empty line. New tests added in previous commits of mine already added an explicit test for allowing the empty line between header and body. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	acfc01332b	mktag: allow turning off fsck.extraHeaderEntry In earlier commits mktag learned to use the fsck machinery, at which point we needed to add fsck.extraHeaderEntry so it could be as strict about extra headers as it's been ever since it was implemented. But it's not nice to need to switch away from "mktag" to "hash-object" + manual "fsck" just because you'd like to have an extra header. So let's support turning it off by getting "fsck.*" variables from the config. Pedantically speaking it's still not possible to make "mktag" behave just like "hash-object -t tag" does, since we're unconditionally going to check the referenced object in verify_object_in_tag(), which is our own check, and not one that exists in fsck.c. But the spirit of "this works like fsck" is preserved, in that if you created such a tag with "hash-object" and did a full "fsck" on the repository it would also error out about that invalid object, it just wouldn't emit the same message as fsck does. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	1f3299fda9	fsck: make fsck_config() re-usable Move the fsck_config() function from builtin/fsck.c to fsck.[ch]. This allows for re-using it in other tools that expose fsck logic and want to support its configuration variables. A logical continuation of this change would be to use a common function for all of {fetch,receive}.fsck.* and fsck.. See `5d477a334a` (fsck (receive-pack): allow demoting errors to warnings, 2015-06-22) and my own `1362df0d41` (fetch: implement fetch.fsck., 2018-07-27) for the relevant code. However, those routines want to not parse the fsck.skipList into OIDs, but rather pass them along with the --strict option to another process. It would be possible to refactor that whole thing so we support e.g. a "fetch." prefix, then just keep track of the skiplist as a filename instead of parsing it, and learn to spew that all out from our internal structures into something we can append to the --strict option. But instead I'm planning to re-use this in "mktag", which'll just re-use these "fsck.*" variables as-is. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	acf9de4c94	mktag: use fsck instead of custom verify_tag() Change the validation logic in "mktag" to use fsck's fsck_tag() instead of its own custom parser. Curiously the logic for both dates back to the same commit[1]. Let's unify them so we're not maintaining two sets functions to verify that a tag is OK. The behavior of fsck_tag() and the old "mktag" code being removed here is different in few aspects. I think it makes sense to remove some of those checks, namely: A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag code disallowed values larger than 1400. Yes there's currently no timezone with a greater offset[2], but since we allow any number of non-offical timezones (e.g. +1234) passing this through seems fine. Git also won't break in the future if e.g. French Polynesia decides it needs to outdo the Line Islands when it comes to timezone extravagance. B. fsck allows missing author names such as "tagger <email>", mktag wouldn't, but would allow e.g. "tagger [2 spaces] <email>" (but not "tagger [1 space] <email>"). Now we allow all of these. C. Like B, but "mktag" disallowed spaces in the <email> part, fsck allows it. In some ways fsck_tag() is stricter than "mktag" was, namely: D. fsck disallows zero-padded dates, but mktag didn't care. So e.g. the timestamp "0000000000 +0000" produces an error now. A test in "t1006-cat-file.sh" relied on this, it's been changed to use "hash-object" (without fsck) instead. There was one check I deemed worth keeping by porting it over to fsck_tag(): E. "mktag" did not allow any custom headers, and by extension (as an empty commit is allowed) also forbade an extra stray trailing newline after the headers it knew about. Add a new check in the "ignore" category to fsck and use it. This somewhat abuses the facility added in `efaba7cc77` (fsck: optionally ignore specific fsck issues completely, 2015-06-22). This is somewhat of hack, but probably the least invasive change we can make here. The fsck command will shuffle these categories around, e.g. under --strict the "info" becomes a "warn" and "warn" becomes "error". Existing users of fsck's (and others, e.g. index-pack) --strict option rely on this. So we need to put something into a category that'll be ignored by all existing users of the API. Pretending that fsck.extraHeaderEntry=error ("ignore" by default) was set serves to do this for us. 1. `ec4465adb3` (Add "tag" objects that can be used to sign other objects., 2005-04-25) 2. https://en.wikipedia.org/wiki/List_of_UTC_time_offsets Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	40ef015a27	mktag: use puts(str) instead of printf("%s\n", str) This introduces no functional change, but refactors the print-out of the hash at the end to do the same thing with less code. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	dfe3948728	mktag: remove redundant braces in one-line body "if" This minor stylistic churn is usually something we'd avoid, but if we don't do this then the file after changes in subsequent commits will only have this minor style inconsistency, so let's change this while we're at it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	0c439117bb	mktag: use default strbuf_read() hint Change the hardcoded hint of 2^12 to 0. The default strbuf hint is perfectly fine here, and the only reason we were hardcoding it is because it survived migration from a pre-strbuf fixed-sized buffer. See `fd17f5b5f7` (Replace all read_fd use with strbuf_read, and get rid of it., 2007-09-10) for that migration. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	692654dca0	mktag tests: test verify_object() with replaced objects Add tests to demonstrate what "mktag" does in the face of replaced objects. There was an existing test for replaced objects fed to "mktag" added in `cc400f5011` (mktag: call "check_sha1_signature" with the replacement sha1, 2009-01-23), but that one only tests a commit->commit mapping. Not a mapping to a different type as like we're also testing for here. We could remove the "mktag" test in t6050-replace.sh now if the created tag wasn't being used by a subsequent "fsck" test. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	30f882c16d	mktag tests: improve verify_object() test coverage The verify_object() function in "mktag.c" is tasked with ensuring that our tag refers to a valid object. The existing test for this might fail because it was also testing that "type taggg" didn't refer to a valid object type (it should be "type tag"), or because we referred to a valid object but got the type wrong. Let's split these tests up, so we're testing all combinations of a non-existing object and in invalid/wrong "type" lines. We need to provide GIT_TEST_GETTEXT_POISON=false here because the "invalid object type" error is emitted by parse_loose_header_extended(), which has that message already marked for translation. Another option would be to use test_i18ngrep, but I prefer always running the test, not skipping it under gettext poison testing. I'm not testing this in combination with "git replace". That'll be done in a subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	ca9a1ed969	mktag tests: test "hash-object" compatibility Change all the successful "mktag" tests to test that "hash-object" produces the same hash for the input, and that fsck passes for both. This tests e.g. that "mktag" doesn't trim its input or otherwise munge it in a way that "hash-object" doesn't. Since we're doing an "fsck --strict" here at the end let's incorporate the creation of the "mytag" name into this test, removing the special-case at the end of the file. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	47c95e77d1	mktag tests: stress test whitespace handling Add tests for a couple of whitespace edge cases around the header/body boundary. I consider the requirement for a blank line before the empty body a bug, it's a long-standing regression which goes against the command's documented behavior. This bug will be addressed in a follow-up change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	3b9e4dd3a3	mktag tests: run "fsck" after creating "mytag" Change the last test in the file to run an "fsck --strict" after creating the tag at the end. We're just doing this for good measure to check that fsck behaves as expected now that there's finally a reference for our valid tag. Other tests going to be checking this elsewhere, but it's nice to cover all the edge cases in this test to make it as self-contained as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	5c2303e0c7	mktag tests: don't create "mytag" twice Change a test added in `e0aaf781f6` (mktag.c: improve verification of tagger field and tests, 2008-03-27) to not create "mytag", which should only be created and verified at the end in an earlier test added in `446c6faec6` (New tests and en-passant modifications to mktag., 2006-07-29). While we're at it let's prevent a similar logic error from creeping into the test by asserting that "mytag" doesn't exist before we create it. Let's do this by moving the test to use "update-ref", instead of our own homebrew ad-hoc refstore update. We're not really testing for anything yet by creating the tag at the end here. A subsequent commit will change that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Ævar Arnfjörð Bjarmason	317c176279	mktag tests: don't redirect stderr to a file needlessly Remove the redirection of stderr to "message" in the valid tag test. This pattern seems to have been copy/pasted from the failure case in `446c6faec6` (New tests and en-passant modifications to mktag., 2006-07-29). While I'm at it do the same for the "replace" tests. The tag creation I'm changing here seems to have been copy/pasted from the "mktag" tests to those tests in `cc400f5011` (mktag: call "check_sha1_signature" with the replacement sha1, 2009-01-23). Nobody examines the contents of the resulting "message" file, so the net result is that error messages cannot be seen in "sh t3800-mktag.sh -v" output. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:28 -08:00
Ævar Arnfjörð Bjarmason	0d35ccb5e0	mktag tests: remove needless SHA-1 hardcoding Change the tests amended in `acb49d1cc8` (t3800: make hash-size independent, 2019-08-18) even more to make them independent of either SHA-1 or SHA-256. Some of these tests were failing for the wrong reasons. The first one being modified here would fail because the line starts with "xxxxxx" instead of "object", the rest of the line doesn't matter. Let's just put a valid hash on the rest of the line anyway to narrow the test down for just the s/object/xxxxxx/ case. The second one being modified here would fail under GIT_TEST_DEFAULT_HASH=sha256 because <some sha-1 length garbage> is an invalid SHA-256, but we should really be testing <some sha-256 length garbage> when under SHA-256. This doesn't really matter since we should be able to trust other parts of the code to validate things in the 0-9a-f range, but let's keep it for good measure. There's a later test which tests an invalid SHA which looks like a valid one, to stress the "We refuse to tag something we can't verify[...]" logic in mktag.c. But here we're testing for a SHA-length string which contains characters outside of the /[0-9a-f]/i set. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:28 -08:00
Ævar Arnfjörð Bjarmason	b5ca549c93	mktag tests: use "test_commit" helper Replace ad-hoc setup of a single commit in the "mktag" tests with our standard helper pattern. The old setup dated back to `446c6faec6` (New tests and en-passant modifications to mktag., 2006-07-29) before the helper existed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:28 -08:00
Ævar Arnfjörð Bjarmason	aba5377f69	mktag tests: don't needlessly use a subshell The use of a subshell dates back to `e9b20943b7` (t/t3800: do not use a temporary file to hold expected result., 2008-01-04). It's not needed anymore, if it ever was. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:28 -08:00
Ævar Arnfjörð Bjarmason	18430ed363	mktag doc: update to explain why to use this Change the mktag documentation to compare itself to the similar "hash-object -t tag" command. Before this someone reading the documentation wouldn't have much of an idea what the difference was. Let's allude to our own validation logic, and cross-link the "mktag" and "hash-object" documentation to aid discover-ability. A follow-up change to migrate "mktag" to use "fsck" validation will make the part about validation logic clearer. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:28 -08:00
Derrick Stolee	3797a0a7b7	maintenance: use Windows scheduled tasks Git's background maintenance uses cron by default, but this is not available on Windows. Instead, integrate with Task Scheduler. Tasks can be scheduled using the 'schtasks' command. There are several command-line options that can allow for some advanced scheduling, but unfortunately these seem to all require authenticating using a password. Instead, use the "/xml" option to pass an XML file that contains the configuration for the necessary schedule. These XML files are based on some that I exported after constructing a schedule in the Task Scheduler GUI. These options only run background maintenance when the user is logged in, and more fields are populated with the current username and SID at run-time by 'schtasks'. Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to specify 'schtasks' as the scheduler, we can test the Windows-specific logic on other platforms. Thus, add a check that the XML file written by Git is valid when xmllint exists on the system. Since we use a temporary file for the XML files sent to 'schtasks', we prefix the random characters with the frequency so it is easier to examine the proper file during tests. Instead of an exact match on the 'args' file, we 'grep' for the arguments other than the filename. There is a deficiency in the current design. Windows has two kinds of applications: GUI applications that start by "winmain()" and console applications that start by "main()". Console applications are attached to a new Console window if they are not already associated with a GUI application. This means that every hour the scheudled task launches a command window for the scheduled tasks. Not only is this visually obtrusive, but it also takes focus from whatever else the user is doing! A simple fix would be to insert a GUI application that acts as a shim between the scheduled task and Git. This is currently possible in Git for Windows by setting the <Command> tag equal to C:\Program Files\Git\git-bash.exe with options "--hide --no-needs-console --command=cmd\git.exe" followed by the arguments currently used. Since git-bash.exe is not included in Windows builds of core Git, I chose to leave out this feature. My plan is to submit a small patch to Git for Windows that converts the use of git.exe with this use of git-bash.exe in the short term. In the long term, we can consider creating this GUI shim application within core Git, perhaps in contrib/. Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:38:02 -08:00
Derrick Stolee	2afe7e3567	maintenance: use launchctl on macOS The existing mechanism for scheduling background maintenance is done through cron. The 'crontab -e' command allows updating the schedule while cron itself runs those commands. While this is technically supported by macOS, it has some significant deficiencies: 1. Every run of 'crontab -e' must request elevated privileges through the user interface. When running 'git maintenance start' from the Terminal app, it presents a dialog box saying "Terminal.app would like to administer your computer. Administration can include modifying passwords, networking, and system settings." This is more alarming than what we are hoping to achieve. If this alert had some information about how "git" is trying to run "crontab" then we would have some reason to believe that this dialog might be fine. However, it also doesn't help that some scenarios just leave Git waiting for a response without presenting anything to the user. I experienced this when executing the command from a Bash terminal view inside Visual Studio Code. 2. While cron initializes a user environment enough for "git config --global --show-origin" to show the correct config file information, it does not set up the environment enough for Git Credential Manager Core to load credentials during a 'prefetch' task. My prefetches against private repositories required re-authenticating through UI pop-ups in a way that should not be required. The solution is to switch from cron to the Apple-recommended [1] 'launchd' tool. [1] https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html The basics of this tool is that we need to create XML-formatted "plist" files inside "~/Library/LaunchAgents/" and then use the 'launchctl' tool to make launchd aware of them. The plist files include all of the scheduling information, along with the command-line arguments split across an array of <string> tags. For example, here is my plist file for the weekly scheduled tasks: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"><dict> <key>Label</key><string>org.git-scm.git.weekly</string> <key>ProgramArguments</key> <array> <string>/usr/local/libexec/git-core/git</string> <string>--exec-path=/usr/local/libexec/git-core</string> <string>for-each-repo</string> <string>--config=maintenance.repo</string> <string>maintenance</string> <string>run</string> <string>--schedule=weekly</string> </array> <key>StartCalendarInterval</key> <array> <dict> <key>Day</key><integer>0</integer> <key>Hour</key><integer>0</integer> <key>Minute</key><integer>0</integer> </dict> </array> </dict> </plist> The schedules for the daily and hourly tasks are more complicated since we need to use an array for the StartCalendarInterval with an entry for each of the six days other than the 0th day (to avoid colliding with the weekly task), and each of the 23 hours other than the 0th hour (to avoid colliding with the daily task). The "Label" value is currently filled with "org.git-scm.git.X" where X is the frequency. We need a different plist file for each frequency. The launchctl command needs to be aligned with a user id in order to initialize the command environment. This must be done using the 'launchctl bootstrap' subcommand. This subcommand is new as of macOS 10.11, which was released in September 2015. Before that release the 'launchctl load' subcommand was recommended. The best source of information on this transition I have seen is available at [2]. The current design does not preclude a future version that detects the available fatures of 'launchctl' to use the older commands. However, it is best to rely on the newest version since Apple might completely remove the deprecated version on short notice. [2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/ To remove a schedule, we must run 'launchctl bootout' with a valid plist file. We also need to 'bootout' a task before the 'bootstrap' subcommand will succeed, if such a task already exists. The need for a user id requires us to run 'id -u' which works on POSIX systems but not Windows. Further, the need for fully-qualitifed path names including $HOME behaves differently in the Git internals and the external test suite. The $HOME variable starts with "C:\..." instead of the "/c/..." that is provided by Git in these subcommands. The test therefore has a prerequisite that we are not on Windows. The cross- platform logic still allows us to test the macOS logic on a Linux machine. We can verify the commands that were run by 'git maintenance start' and 'git maintenance stop' by injecting a script that writes the command-line arguments into GIT_TEST_MAINT_SCHEDULER. An earlier version of this patch accidentally had an opening "<dict>" tag when it should have had a closing "</dict>" tag. This was caught during manual testing with actual 'launchctl' commands, but we do not want to update developers' tasks when running tests. It appears that macOS includes the "xmllint" tool which can verify the XML format. This is useful for any system that might contain the tool, so use it whenever it is available. We strive to make these tests work on all platforms, but Windows caused some headaches. In particular, the value of getuid() called by the C code is not guaranteed to be the same as `$(id -u)` invoked by a test. This is because `git.exe` is a native Windows program, whereas the utility programs run by the test script mostly utilize the MSYS2 runtime, which emulates a POSIX-like environment. Since the purpose of the test is to check that the input to the hook is well-formed, the actual user ID is immaterial, thus we can work around the problem by making the the test UID-agnostic. Another subtle issue is the $HOME environment variable being a Windows-style path instead of a Unix-style path. We can be more flexible here instead of expecting exact path matches. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:38:02 -08:00
Felipe Contreras	5a067ba9d0	completion: add proper public __git_complete When __git_complete was introduced, it was meant to be temporarily, while a proper guideline for public shell functions was established (tentatively _GIT_complete), but since that never happened, people in the wild started to use __git_complete, even though it was marked as not public. Eight years is more than enough wait, let's mark this function as public, and make it a bit more user-friendly. So that instead of doing: __git_complete gk __gitk_main The user can do: __git_complete gk gitk And instead of: __git_complete gf _git_fetch Do: __git_complete gf git_fetch Backwards compatibility is maintained. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:25:56 -08:00
Felipe Contreras	0e02bdc17a	test: completion: add tests for __git_complete Even though the function was marked as not public, it's already used in the wild. We should at least test basic functionality. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:25:56 -08:00
Felipe Contreras	810df0ea8e	completion: bash: improve function detection 1. We should quote the argument 2. We don't need two redirections 3. A safeguard for arguments (-a) would be good Suggested-by: René Scharfe <l.s.r@web.de> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:25:56 -08:00
Felipe Contreras	7f94b78dda	completion: bash: add __git_have_func helper This makes the code more readable, and also will help when new code wants to do similar checks. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:25:56 -08:00
Derrick Stolee	fa7ca5d4fe	cache-tree: use trace2 in cache_tree_update() This matches a trace_performance_enter()/trace_performance_leave() pair added by `0d1ed59` (unpack-trees: add performance tracing, 2018-08-18). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:23:08 -08:00
Derrick Stolee	c338898a47	unpack-trees: add trace2 regions The unpack_trees() method is quite complicated and its performance can change dramatically depending on how it is used. We already have some performance tracing regions, but they have not been updated to the trace2 API. Do so now. We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which uses a linear scan through the index without recursing into trees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:23:08 -08:00
Derrick Stolee	da8be8ced6	tree-walk: report recursion counts The traverse_trees() method recursively walks through trees, but also prunes the tree-walk based on a callback. Some callers, such as unpack_trees(), are quite complicated and can have wildly different performance between two different commands. Create constants that count these values and then report the results at the end of a process. These counts are cumulative across multiple "root" instances of traverse_trees(), but they provide reproducible values for demonstrating improvements to the pruning algorithm when possible. This change is modeled after a similar statistics reporting in `42e50e78` (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:23:08 -08:00
Derrick Stolee	90b666da60	revision: trace topo-walk statistics We trace statistics about the effectiveness of changed-path Bloom filters since `42e50e78` (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Add similar tracing for the topo-walk algorithm that uses generation numbers to limit the walk size. This information can help investigate and describe benefits to heuristics and other changes. The information that is printed is in JSON format and can be formatted nicely to present as follows: { "count_explort_walked":2603, "count_indegree_walked":2603, "count_topo_walked":473 } Each of these values count the number of commits are visited by each of the three "stages" of the topo-walk as detailed in `b4542418` (revision.c: generation-based topo-order algorithm, 2018-11-01). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:18:22 -08:00
Martin Ågren	bc62692757	hash-lookup: rename from sha1-lookup Change all remnants of "sha1" in hash-lookup.c and .h and rename them to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Martin Ågren	7a7d992d0d	sha1-lookup: rename `sha1_pos()` as `hash_pos()` Rename this function to reflect that we're not just able to handle SHA-1 these days. There are a few instances of "sha1" left in sha1-lookup.[ch] after this, but those will be addressed in the next commit. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Martin Ågren	e5afd4449d	object-file.c: rename from sha1-file.c Drop the last remnant of "sha1" in this file and rename it to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Martin Ågren	1e6771e504	object-name.c: rename from sha1-name.c Generalize the last remnants of "sha" and "sha1" in this file and rename it to reflect that we're not just able to handle SHA-1 these days. We need to update one test to check for an updated error string. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Elijah Newren	350410f6b1	diffcore-rename: remove unnecessary duplicate entry checks Commit `25d5ea410f` ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a duplicate entry check on rename_src in order to avoid segfaults; the code at the time was prone to double free()s and an easy way to avoid it was just to turn off rename detection for any duplicate entries. Note that the form of the check was modified two commits ago in this series. Similarly, commit `4d6be03b95` ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added a duplicate entry check on rename_dst for the exact same reason -- the code was prone to double free()s, and an easy way to avoid it was just to turn off rename detection entirely. Note that the form of the check was modified in the commit just before this one. In the original code in both places, the code was dealing with individual diff_filespecs and trying to match things up, instead of just keeping the original diff_filepairs around as we do now. The intervening change in structure has fixed the accounting problems and the associated double free()s that used to occur, and thus we already have a better fix. As such, we can remove the band-aid checks for duplicate entries. Due to the last two patches, the diffcore_rename() setup is no longer a sizeable chunk of overall runtime. Thus, in a large rebase of many commits with lots of renames and several optimizations to inexact rename detection, this patch only speeds up the overall code by about half a percent or so and is pretty close to the run-to-run variability making it hard to get an exact measurement. However, with some trace2 regions around the setup code in diffcore_rename() so that I can focus on just it, I measure that this patch consistently saves almost a third of the remaining time spent in diffcore_rename() setup. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 12:59:34 -08:00
Elijah Newren	4ef88fc3a8	merge-ort: add handling for different types of files at same path Add some handling that explicitly considers collisions of the following types: * file/submodule * file/symlink * submodule/symlink Leaving them as conflicts at the same path are hard for users to resolve, so move one or both of them aside so that they each get their own path. Note that in the case of recursive handling (i.e. call_depth > 0), we can just use the merge base of the two merge bases as the merge result much like we do with modify/delete conflicts, binary files, conflicting submodule values, and so on. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	4204cd591b	merge-ort: copy find_first_merges() implementation from merge-recursive.c Code is identical for the function body in the two files, the call signature is just slightly different in merge-ort than merge-recursive as noted a couple commits ago. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	70f19c7fce	merge-ort: implement format_commit() This implementation is based on a mixture of print_commit() and output_commit_title() from merge-recursive.c so that it can be used to take over both functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	c73cda76b1	merge-ort: copy and adapt merge_submodule() from merge-recursive.c Take merge_submodule() from merge-recursive.c and make slight adjustments, predominantly around deferring output using path_msg() instead of using merge-recursive's output() and show() functions. There's also a fix for recursive cases (when call_depth > 0) and a slight change to argument order for find_first_merges(). find_first_merges() and format_commit() are left unimplemented for now, but will be added by subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	f591c47246	merge-ort: copy and adapt merge_3way() from merge-recursive.c Take merge_3way() from merge-recursive.c and make slight adjustments based on different data structures (direct usage of object_id rather diff_filespec, separate pathnames which based on our careful interning of pathnames in opt->priv->paths can be compared with '!=' rather than 'strcmp'). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	62fdec17a1	merge-ort: flesh out implementation of handle_content_merge() This implementation is based heavily on merge_mode_and_contents() from merge-recursive.c, though it has some fixes for recursive merges (i.e. when call_depth > 0), and has a number of changes throughout based on slight differences in data structures and in how the functions are called. It is, however, based on two new helper functions -- merge_3way() and merge_submodule -- for which we only provide die-not-implemented stubs at this point. Future commits will add implementations of these functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	991bbdcab9	merge-ort: handle book-keeping around two- and three-way content merge In addition to the content merge (which will go in a subsequent commit), we need to worry about conflict messages, placing results in higher order stages in case of a df_conflict, and making sure the results are placed in ci->merged.result so that they will show up in the working tree. Take care of all that external book-keeping, moving the simplistic just-take-HEAD code into the barebones handle_content_merge() function for now. Subsequent commits will flesh out handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	5a1a1e8ea9	merge-ort: implement unique_path() helper Implement unique_path(), based on the one from merge-recursive.c. It is simplified, however, due to: (1) using strmaps, and (2) the fact that merge-ort lets the checkout codepath handle possible collisions with the working tree means that other code locations don't have to. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	23366d2aa9	merge-ort: handle directory/file conflicts that remain When a directory/file conflict remains, we can leave the directory where it is, but need to move the information about the file to a different pathname. After moving the file to a different pathname, we allow subsequent process_entry() logic to handle any additional details that might be relevant. This depends on a new helper function, unique_path(), that dies with an unimplemented error currently but will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	0ccfa4e5d8	merge-ort: handle D/F conflict where directory disappears due to merge When one side has a directory at a given path and the other side of history has a file at the path, but the merge resolves the directory away (e.g. because no path within that directory was modified and the other side deleted it, or because renaming moved all the files elsewhere), then we don't actually have a conflict anymore. We just need to clear away any information related to the relevant directory, and then the subsequent process_entry() handling can handle the given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Junio C Hamano	ffd27e6cb2	CoC: explicitly take any whitespace breakage We'll keep this document mostly in sync with the upstream; let's help "git am" and "git show" by telling them that they may introduce what we may consider whitespace errors. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 09:44:49 -08:00
Ævar Arnfjörð Bjarmason	cb50786f49	CoC: Update word-wrapping to match upstream When the CoC document was added in `5cdf2301d4` (add a Code of Conduct document, 2019-09-24) it was added from some 1.4 version of the document whose word wrapping doesn't match what's currently at [1], which matches content/version/1/4/code-of-conduct.md in the CoC repository[2]. Let's update our version to match that, to make reading subsequent diffs easier. There are no non-whitespace changes here. 1. https://www.contributor-covenant.org/version/1/4/code-of-conduct/ 2. https://github.com/ContributorCovenant/contributor_covenant Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 09:14:38 -08:00
Eric Wong	a9ecaa06a7	core.abbrev=no disables abbreviations This allows users to write hash-agnostic scripts and configs by disabling abbreviations. Using "-c core.abbrev=40" will be insufficient with SHA-256, and "-c core.abbrev=64" won't work with SHA-1 repos today. Signed-off-by: Eric Wong <e@80x24.org> [jc: tweaked implementation, added doc and a test] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-23 13:40:09 -08:00
Ævar Arnfjörð Bjarmason	9ce0fc3311	mktag doc: grammar fix, when exists -> when it exists Amend the wording of documentation added in `6cfec03680` (mktag: minimally update the description., 2007-06-10). It makes more sense to say "when it exists" here, as we're referring to "the message". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-22 17:49:05 -08:00
Ævar Arnfjörð Bjarmason	f59b61dc4d	mktag doc: say <hash> not <sha1> Change the "mktag" documentation to refer to the input hash as just "hash", not "sha1". This command has supported SHA-256 for a while now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-22 17:49:05 -08:00
Eric Sunshine	5bc12c11cc	t/perf: avoid unnecessary test_export() recursion test_export() has been self-recursive since its inception even though a simple for-loop would have served just as well to append its arguments to the `test_export_` variable separated by the pipe character "\|". Recently `test_export_` was changed instead to a space-separated list of tokens to be exported, an operation which can be accomplished via a single simple assignment, with no need for looping or recursion. Therefore, simplify the implementation. While at it, take advantage of the fact that variable names to be exported are shell identifiers, thus won't be composed of special characters or whitespace, thus simple a `$*` can be used rather than magical `"$@"`. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-22 13:45:36 -08:00
Sergey Organov	af04d8f1a5	t4013: add tests for --diff-merges=first-parent This new option provides essential new functionality, changing diff output to first parent only, without changing history traversal mode, so it deserves its own test. As we do it, add additional test that --diff-merges=first-parent by itself doesn't imply -p and only outputs diffs for merge commits. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	1d24509b7b	doc/git-show: include --diff-merges description Move description of --diff-merges option from git-log.txt to diff-options.txt so that it is included in the git-show help. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	e58142add4	doc/rev-list-options: document --first-parent changes merges format After introduction of the --diff-merges=first-parent, the --first-parent sets the default format for merges to the same value as this new option. Document this behavior and add corresponding reference to --diff-merges. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	8efd2efc32	doc/diff-generate-patch: mention new --diff-merges option Mention --diff-merges instead of -m in a note to merge formats to aid discoverability, as -m is now described among --diff-merges options anyway. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	b5ffa9ec10	doc/git-log: describe new --diff-merges options Describe all the new --diff-merges options in the git-log.txt and adopt description of originals accordingly. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	388091fe4d	diff-merges: add '--diff-merges=1' as synonym for 'first-parent' As we now have --diff-merges={m\|c\|cc}, add --diff-merges=1 as synonym for --diff-merges=first-parent, to have shorter mnemonics for it as well. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	5071c75316	diff-merges: add old mnemonic counterparts to --diff-merges This adds --diff-merges={m\|c\|cc} values that match mnemonics of old options, for those who are used to them. Note that, say, --diff-meres=cc behaves differently than --cc, as the latter implies -p and therefore enables diffs for all the commits, while the former enables output of diffs for merge commits only. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	a6d19ecc6b	diff-merges: let new options enable diff without -p New options don't have any visible effect unless -p is either given or implied, as unlike -c/-cc we don't imply -p with --diff-merges. To fix this, this patch adds new functionality by letting new options enable output of diffs for merge commits only. Add 'merges_need_diff' field and set it whenever diff output for merges is enabled by any of the new options. Extend diff output logic accordingly, to output diffs for merges when 'merges_need_diff' is set even when no -p has been provided. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	5733b20f41	diff-merges: do not imply -p for new options Add 'combined_imply_patch' field and set it only for old --cc/-c options, then imply -p if this flag is set instead of implying -p whenever 'combined_merge' flag is set. We don't want new --diff-merge options to imply -p, to make it possible to enable output of diffs for merges independently from non-merge commits. At the same time we want to preserve behavior of old --c/-c/-m options and their interactions with --first-parent, to stay backward-compatible. This patch is first step in this direction: it separates old "--cc/-c imply -p" logic from the rest of the options. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	8c0ba528bc	diff-merges: implement new values for --diff-merges We first implement new options as exact synonyms for their original counterparts, to get all the infrastructure right, and keep functional improvements for later commits. The following values are implemented: --diff-merges= old equivalent first\|first-parent = --first-parent (only format implications) sep\|separate = -m comb\|combined = -c dense\| dense-combined = --cc Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:32 -08:00
Sergey Organov	255a4dacc5	diff-merges: make -m/-c/--cc explicitly mutually exclusive -c/--cc got precedence over -m only because of external logic where corresponding flags are checked before that for -m. This is too error-prone, so add code that explicitly makes these 3 options mutually exclusive, so that the last option specified on the command-line gets precedence. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	3d2b5f2f49	diff-merges: refactor opt settings into separate functions To prepare introduction of new options some of which will be synonyms to existing options, let every option handling code just call corresponding function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	a6e66af923	diff-merges: get rid of now empty diff_merges_init_revs() After getting rid of 'ignore_merges' field, the diff_merges_init_revs() function became empty. Get rid of it. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	d9b1bc6d13	diff-merges: group diff-merge flags next to each other inside 'rev_info' The relevant flags were somewhat scattered over definition of 'struct rev_info'. Rearrange them to group them together. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	1a2c4d8050	diff-merges: split 'ignore_merges' field 'ignore_merges' was 3-way field that served two distinct purposes that we now assign to 2 new independent flags: 'separate_merges', and 'explicit_diff_merges'. 'separate_merges' tells that we need to output diff format containing separate diff for every parent (as opposed to 'combine_merges'). 'explicit_diff_merges' tells that at least one of diff-merges options has been explicitly specified on the command line, so no defaults should apply. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	6fc944d895	diff-merges: fix -m to properly override -c/--cc Logically, -m, -c, --cc specify 3 different formats for representing merge commits, yet -m doesn't in fact override -c or --cc, that makes no sense. Fix -m to properly override -c/--cc, and change the tests accordingly. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	ec315c66bb	t4013: add tests for -m failing to override -c/--cc Logically, -m, -c, --cc specify 3 different formats for representing merge commits, yet -m doesn't in fact override -c or --cc, that makes no sense. Add 2 expected to fail tests that demonstrate the problem. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	14c14b44e4	t4013: support test_expect_failure through ':failure' magic Add support to be able to specify expected failure, through :failure magic, like this: :failure cmd args Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	e121b4b822	diff-merges: revise revs->diff flag handling Do not set revs->diff when we encounter an option that needs it, as it'd be impossible to undo later. Besides, some other options than what we handle here set this flag as well, and we'd interfere with them trying to clear this flag later. Rather set revs->diff, if finally needed, in diff_merges_setup_revs(). As an additional bonus, this also makes our code shorter. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	0c627f5d3c	diff-merges: handle imply -p on -c/--cc logic for log.c Move logic that handles implying -p on -c/--cc from log_setup_revisions_tweak() to diff_merges_setup_revs(), where it belongs. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	3291eea310	diff-merges: introduce revs->first_parent_merges flag This new field allows us to separate format of diff for merges from 'first_parent_only' flag which primary purpose is limiting history traversal. This change further localizes diff format selection logic into the diff-merges.c file. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	3b6c17b5c0	diff-merges: new function diff_merges_set_dense_combined_if_unset() Call it where given functionality is needed instead of direct checking/tweaking of diff merges related fields. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	09322b1da9	diff-merges: new function diff_merges_suppress() This function sets all the relevant flags to disabled state, so that no code that checks only one of them get it wrong. Then we call this new function everywhere where diff merges output suppression is needed. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	564a4fc847	diff-merges: re-arrange functions to match the order they are called in For clarity, define public functions in the order they are called, to make logic inter-dependencies easier to grok. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	4f54544d73	diff-merges: rename diff_merges_default_to_enable() to match semantics Rename diff_merges_default_to_enable() to diff_merges_default_to_first_parent() to match its semantics. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	7acf0d06f5	diff-merges: move checks for first_parent_only out of the module The checks for first_parent_only don't in fact belong to this module, as the primary purpose of this flag is history traversal limiting, so get it out of this module and rename the diff_merges_first_parent_defaults_to_enable() to diff_merges_default_to_enable() to match new semantics. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	18f09473bf	diff-merges: rename all functions to have common prefix Use the same "diff_merges" prefix for all the diff merges function names. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	a37eec6333	revision: move diff merges functions to its own diff-merges.c Create separate diff-merges.c and diff-merges.h files, and move all the code related to handling of diff merges there. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	3d4fd94363	revision: provide implementation for diff merges tweaks Use these implementations from show_setup_revisions_tweak() and log_setup_revisions_tweak() in builtin/log.c. This completes moving of management of diff merges parameters to a single place, where we can finally observe them simultaneously. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	027c4783d7	revision: factor out initialization of diff-merge related settings Move initialization code related to diffing merges into new init_diff_merge_revs() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	299a663440	revision: factor out setup of diff-merge related settings Move all the setting code related to diffing merges into new setup_diff_merge_revs() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	891e417cbc	revision: factor out parsing of diff-merge related options Move all the parsing code related to diffing merges into new parse_diff_merge_opts() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Eric Sunshine	cf76baea41	worktree: teach `repair` to fix multi-directional breakage `git worktree repair` knows how to repair the two-way links between the repository and a worktree as long as a link in one or the other direction is sound. For instance, if a linked worktree is moved (without using `git worktree move`), repair is possible because the worktree still knows the location of the repository even though the repository no longer knows where the worktree is. Similarly, if the repository is moved, repair is possible since the repository still knows the locations of the worktrees even though the worktrees no longer know where the repository is. However, if both the repository and the worktrees are moved, then links are severed in both directions, and no repair is possible. This is the case even when the new worktree locations are specified as arguments to `git worktree repair`. The reason for this limitation is twofold. First, when `repair` consults the worktree's gitfile (/path/to/worktree/.git) to determine the corresponding <repo>/worktrees/<id>/gitdir file to fix, <repo> is the old path to the repository, thus it is unable to fix the `gitdir` file at its new location since it doesn't know where it is. Second, when `repair` consults <repo>/worktrees/<id>/gitdir to find the location of the worktree's gitfile (/path/to/worktree/.git), the path recorded in `gitdir` is the old location of the worktree's gitfile, thus it is unable to repair the gitfile since it doesn't know where it is. Fix these shortcomings by teaching `repair` to attempt to infer the new location of the <repo>/worktrees/<id>/gitdir file when the location recorded in the worktree's gitfile has become stale but the file is otherwise well-formed. The inference is intentionally simple-minded. For each worktree path specified as an argument, `git worktree repair` manually reads the ".git" gitfile at that location and, if it is well-formed, extracts the <id>. It then searches for a corresponding <id> in <repo>/worktrees/ and, if found, concludes that there is a reasonable match and updates <repo>/worktrees/<id>/gitdir to point at the specified worktree path. In order for <repo> to be known, `git worktree repair` must be run in the main worktree or bare repository. `git worktree repair` first attempts to repair each incoming /path/to/worktree/.git gitfile to point at the repository, and then attempts to repair outgoing <repo>/worktrees/<id>/gitdir files to point at the worktrees. This sequence was chosen arbitrarily when originally implemented since the order of fixes is immaterial as long as one side of the two-way link between the repository and a worktree is sound. However, for this new repair technique to work, the order must be reversed. This is because the new inference mechanism, when it is successful, allows the outgoing <repo>/worktrees/<id>/gitdir file to be repaired, thus fixing one side of the two-way link. Once that side is fixed, the other side can be fixed by the existing repair mechanism, hence the order of repairs is now significant. Two safeguards are employed to avoid hijacking a worktree from a different repository if the user accidentally specifies a foreign worktree as an argument. The first, as described above, is that it requires an <id> match between the repository and the worktree. That itself is not foolproof for preventing hijack, so the second safeguard is that the inference will only kick in if the worktree's /path/to/worktree/.git gitfile does not point at a repository. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:44:28 -08:00
Elijah Newren	8119214f4e	merge-ort: implement merge_incore_recursive() Implement merge_incore_recursive(), mostly through the use of a new helper function, merge_ort_internal(), which itself is based off merge_recursive_internal() from merge-recursive.c. This drops the number of failures in the testsuite when run under GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00
Elijah Newren	43e9c4eecc	merge-ort: make clear_internal_opts() aware of partial clearing In order to handle recursive merges, after merging merge-bases we need to clear away most of the data we had built up but some of it needs to be kept -- in particular the "output" field. Rename the function to reflect its future change in use. Further, since "reinitialize" means we'll be reusing the fields immediately, take advantage of this to only partially clear maps, leaving the hashtable allocated and pre-sized. (This may be slightly out-of-order since the speedups aren't realized until there are far more strmaps in use, but the patch submission process already went out of order because of various questions and requests for strmap. Anyway, see commit `6ccdfc2a20` ("strmap: enable faster clearing and reusing of strmaps", 2020-11-05), for performance details about the use of strmap_partial_clear().) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00
Elijah Newren	4296d8f17d	merge-ort: copy a few small helper functions from merge-recursive.c In a subsequent commit, we will implement the traditional recursiveness that gave merge-recursive its name, namely merging non-unique merge-bases to come up with a single virtual merge base. Copy a few helper functions from merge-recursive.c that we will use in the implementation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00
Elijah Newren	b0ca120554	commit: move reverse_commit_list() from merge-recursive Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00
Felipe Contreras	c525de335e	pull: display default warning only when non-ff There's no need to display the annoying warning on every pull... only the ones that are not fast-forward. The current warning tests still pass, but not because of the arguments or the configuration, but because they are all fast-forward. We need to test non-fast-forward situations now. Suggestions-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:39:42 -08:00
Junio C Hamano	7539fdc629	pull: correct condition to trigger non-ff advice Refactor the advise() call that teaches users how they can choose between merge and rebase into a helper function. This revealed that the caller's logic needs to be further clarified to allow future actions (like "erroring out" instead of the current "go ahead and merge anyway") that should happen whether the advice message is squelched out. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:39:42 -08:00
Junio C Hamano	b044db9172	pull: get rid of unnecessary global variable It is easy enough to do, and gives a more descriptive name to the variable that is scoped in a more focused way. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:39:17 -08:00
Elijah Newren	6fcccbd755	merge-ort: add implementation of type-changed rename handling Implement cases where renames are involved in type changes (i.e. the side of history that didn't rename the file changed its type from a regular file to a symlink or submodule). There was some code to handle this in merge-recursive but only in the special case when the renamed file had no content changes. The code here works differently -- it knows process_entry() can handle mode conflicts, so it does a few minimal tweaks to ensure process_entry() can just finish the job as needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Elijah Newren	f1665e6918	merge-ort: add implementation of normal rename handling Implement handling of normal renames. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_NORMAL in process_renames() * the RENAME_NORMAL case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by `66c62eaec6` ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit `ef52778708` ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. (To be fair, the code for handling normal renames wasn't all that complicated beforehand, but it's still much simpler now.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Elijah Newren	35e47e3514	merge-ort: add implementation of rename collisions Implement rename/rename(2to1) and rename/add handling, i.e. a file is renamed into a location where another file is added (with that other file either being a plain add or itself coming from a rename). Note that rename collisions can also have a special case stacked on top: the file being renamed on one side of history is deleted on the other (yielding either a rename/add/delete conflict or perhaps a rename/rename(2to1)/delete[/delete]) conflict. One thing to note here is that when there is a double rename, the code in question only handles one of them at a time; a later iteration through the loop will handle the other. After they've both been handled, process_entry()'s normal add/add code can handle the collision. This code replaces the following from merge-recurisve.c: * all the 2to1 code in process_renames() * the RENAME_TWO_FILES_TO_ONE case of process_entry() * handle_rename_rename_2to1() * handle_rename_add() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of six separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by `66c62eaec6` ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit `ef52778708` ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Elijah Newren	2e91ddd24e	merge-ort: add implementation of rename/delete conflicts Implement rename/delete conflicts, i.e. one side renames a file and the other deletes the file. This code replaces the following from merge-recurisve.c: * the code relevant to RENAME_DELETE in process_renames() * the RENAME_DELETE case of process_entry() * handle_rename_delete() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_change_delete() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by `66c62eaec6` ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit `ef52778708` ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here, because rename/delete cases will also trigger the modify/delete codepath. However, we only want a modify/delete message to be printed for a rename/delete conflict if there is a content change in the renamed file in addition to the rename. So process_renames() and process_entry() aren't quite fully orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Elijah Newren	53e88a0353	merge-ort: add implementation of both sides renaming differently Implement rename/rename(1to2) handling, i.e. both sides of history renaming a file and rename it differently. This code replaces the following from merge-recurisve.c: * all the 1to2 code in process_renames() * the RENAME_ONE_FILE_TO_TWO case of process_entry() * handle_rename_rename_1to2() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_file_collision() * setup_rename_conflict_info() The consolidation of five separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by `66c62eaec6` ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit `ef52778708` ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. To be fair, there is a _slight_ tweak to process_entry() here to make sure that the two different paths aren't marked as clean but are left in a conflicted state. So process_renames() and process_entry() aren't quite entirely orthogonal, but they are pretty close. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Elijah Newren	af1e56c49e	merge-ort: add implementation of both sides renaming identically Implement rename/rename(1to1) handling, i.e. both sides of history renaming a file but renaming the same way. This code replaces the following from merge-recurisve.c: * all the 1to1 code in process_renames() * the RENAME_ONE_FILE_TO_ONE case of process_entry() Also, there is some shared code from merge-recursive.c for multiple different rename cases which we will no longer need for this case (or other rename cases): * handle_rename_normal() * setup_rename_conflict_info() The consolidation of four separate codepaths into one is made possible by a change in design: process_renames() tweaks the conflict_info entries within opt->priv->paths such that process_entry() can then handle all the non-rename conflict types (directory/file, modify/delete, etc.) orthogonally. This means we're much less likely to miss special implementation of some kind of combination of conflict types (see commits brought in by `66c62eaec6` ("Merge branch 'en/merge-tests'", 2020-11-18), especially commit `ef52778708` ("merge tests: expect improved directory/file conflict handling in ort", 2020-10-26) for more details). That, together with letting worktree/index updating be handled orthogonally in the merge_switch_to_result() function, dramatically simplifies the code for various special rename cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 17:18:32 -08:00
Junio C Hamano	c3b58472be	pack-redundant: gauge the usage before proposing its removal The subcommand is unusably slow and the reason why nobody reports it as a performance bug is suspected to be the absense of users. Let's show a big message that asks the user to tell us that they still care about the command when an attempt is made to run the command, with an escape hatch to override it with a command line option. In a few releases, we may turn it into an error and keep it for a few more releases before finally removing it (during the whole time, the plan to remove it would be interrupted by end user raising hand). Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 14:30:11 -08:00
Elijah Newren	9db2ac5616	diffcore-rename: accelerate rename_dst setup register_rename_src() simply references the passed pair inside rename_src. In contrast, add_rename_dst() did something entirely different for rename_dst. Instead of copying the passed pair, it made a copy of the second diff_filespec from the passed pair, referenced it, and then set the diff_rename_dst.pair field to NULL. Later, when a pairing is found, record_rename_pair() allocated a full diff_filepair via diff_queue() and pointed its src and dst fields at the appropriate diff_filespecs. This contrast between register_rename_src() for the rename_src data structure and add_rename_dst() for the rename_dst data structure is oddly inconsistent and requires more memory and work than necessary. Let's just reference the original diff_filepair in rename_dst as-is, just as we do with rename_src. Add a new rename_dst.is_rename field, since the rename_dst.p field is never NULL unlike the old rename_dst.pair field. Taking advantage of this change and the fact that same-named paths will be adjacent, we can get rid of the sorting of the array and most of the lookups on it, allowing us to instead just append as we go. However, there is one remaining reason to still keep locate_rename_dst(): handling broken pairs (i.e. when break detection is on). Those are somewhat rare, but we can set up a simple strintmap to get the map between the source and the index. Doing that allows us to still have a fast lookup without sorting the rename_dst array. Since the sorting had been done in a weakly quadratic manner, when many renames are involved this time could add up. There is still a strcmp() in add_rename_dst() that I have left in place to make it easier to verify that the algorithm has the same results. This strcmp() is there to check for duplicate destination entries (which was the easiest way at the time to avoid segfaults in the diffcore-rename code when trees had multiple entries at a given path). The underlying double free()s are no longer an issue with the new algorithm, but that can be addressed in a subsequent commit. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, both setup time and write back to output queue time from diffcore_rename() were sizeable chunks of overall runtime. This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	b970b4ef62	diffcore-rename: simplify and accelerate register_rename_src() register_rename_src() took pains to create an array in rename_src which was sorted by pathname of the contained diff_filepair. The sorting was entirely unnecessary since callers pass filepairs to us in sorted order. We can simply append to the end of the rename_src array, speeding up diffcore_rename() setup time. Also, note that I dropped the return type on the function since it was unconditionally discarded anyway. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, diffcore_rename() setup time was a sizeable chunk of overall runtime. This patch dropped execution time of rebasing 35 commits with lots of renames by 2% overall. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	ac14de13b2	t4058: explore duplicate tree entry handling in a bit more detail While creating the last commit, I found a number of other cases where git would segfault when faced with trees that have duplicate entries. None of these segfaults are in the diffcore-rename code (they all occur in cache-tree and unpack-trees). Further, to my knowledge, no one has ever been adversely affected by these bugs, and given that it has been 15 years and folks have fixed a few other issues with historical duplicate entries (as noted in the last commit), I am not sure we will ever run into anyone having problems with these. So I am not sure these are worth fixing, but it doesn't hurt to at least document these failures in the same test file that is concerned with duplicate tree entries. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	5c72261c66	t4058: add more tests and documentation for duplicate tree entry handling Commit `4d6be03b95` ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added t4058 to demonstrate that a workaround it added to avoid double frees (namely to just turn off rename detection when trees had duplicate entries) would indeed avoid segfaults. The tests, though, give the impression that the expected diffs are "correct" when in reality they are just "don't segfault, and do something semi-reasonable under the circumstances". Add some notes to make this clearer. Also, commit `25d5ea410f` ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a similar workaround to avoid segfaults, but for rename_src rather than rename_dst. I do not see any tests in the testsuite to cover the collision detection of entries limited to the source side, so add a couple. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	81c4bf0296	diffcore-rename: reduce jumpiness in progress counters Inexact rename detection works by comparing all sources to all destinations, computing similarities, and then finding the best matches among those that are sufficiently similar. However, it is preceded by exact rename detection that works by checking if there are files with identical hashes. If exact renames are found, we can exclude some files from inexact rename detection. The inexact rename detection loops over the full set of files, but immediately skips those for which rename_dst[i].is_rename is true and thus doesn't compare any sources to that destination. As such, these paths shouldn't be included in the progress counter. For the eagle eyed, this change hints at an actual optimization -- the first one I presented at Git Merge 2020. I'll be submitting that optimization later, once the basic merge-ort algorithm has merged. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	ad8a1be529	diffcore-rename: simplify limit check diffcore-rename had two different checks of the form if ((a < limit \|\| b < limit) && a * b <= limit * limit) This can be simplified to if (st_mult(a, b) <= st_mult(limit, limit)) which makes it clearer how we are checking for overflow, and makes it much easier to parse given the drop from 8 to 4 variable appearances. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	00b8cccdd8	diffcore-rename: avoid usage of global in too_many_rename_candidates() too_many_rename_candidates() got the number of rename destinations via an argument to the function, but the number of rename sources via a global variable. That felt rather inconsistent. Pass in the number of rename sources as an argument as well. While we are at it... We had a local variable, num_src, that served two purposes. Initially it was set to the global value, but later was used for counting a subset of the number of sources. Since we now have a function argument for the former usage, introduce a clearer variable name for the latter usage. This patch has no behavioral changes; it's just renaming and passing an argument instead of grabbing it from the global namespace. (You may find it easier to view the patch using git diff's --color-words option.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Elijah Newren	26a66a6b1c	diffcore-rename: rename num_create to num_destinations Our main data structures are rename_src and rename_dst. For counters of these data structures, num_sources and num_destinations seem natural; definitely more so than using num_create for the latter. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:34:50 -08:00
Felipe Contreras	278f4be806	pull: give the advice for choosing rebase/merge much later Eventually we want to be omit the advice when we can fast-forward in which case there is no reason to require the user to choose between rebase or merge. In order to do so, we need to delay giving the advice up to the point where we can check if we can fast-forward or not. Additionally, config_get_rebase() was probably never its true home. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 09:03:17 -08:00
Felipe Contreras	77a7ec6329	pull: refactor fast-forward check We would like to be able to make this check before the decision to rebase is made in a future step. Besides, using a separate helper makes the code easier to follow. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:59:40 -08:00
Elijah Newren	c2d267df02	merge-ort: add basic outline for process_renames() Add code which determines which kind of special rename case each rename corresponds to, but leave the handling of each type unimplemented for now. Future commits will implement each one. There is some tenuous resemblance to merge-recursive's process_renames(), but comparing the two is very unlikely to yield any insights. merge-ort's process_renames() is a bit complex and I would prefer if I could simplify it more, but it is far easier to grok than merge-recursive's function of the same name in my opinion. Plus, merge-ort handles more rename conflict types than merge-recursive does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:45:59 -08:00
Elijah Newren	965a7bc21c	merge-ort: implement compare_pairs() and collect_renames() Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:45:59 -08:00
Elijah Newren	f39d05ca26	merge-ort: implement detect_regular_renames() Based heavily on merge-recursive's get_diffpairs() function, and also includes the necessary paired call to diff_warn_rename_limit() so that users will be warned if merge.renameLimit is not sufficiently large for rename detection to run. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:45:59 -08:00
Elijah Newren	e1a124e8dc	merge-ort: add initial outline for basic rename detection Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:45:58 -08:00
Elijah Newren	864075ec43	merge-ort: add basic data structures for handling renames This will grow later, but we only need a few fields for basic rename handling. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:45:58 -08:00
Elijah Newren	c5a6f65527	merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for `1f3c9ba707` ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	e2e9dc030c	merge-ort: add die-not-implemented stub handle_content_merge() function This simplistic and weird-looking patch is here to facilitate future patch submissions. Adding this stub allows rename detection code to reference it in one patch series, while a separate patch series can define the implementation, and then both series can merge cleanly and work nicely together at that point. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	04af1879b9	merge-ort: add function grouping comments Commit b658536f59 ("merge-ort: add some high-level algorithm structure", 2020-10-27) added high-level structure of the ort merge algorithm. As we have added more and more functions, that high-level structure has been slightly obscured. Since functions are still grouped according to this high-level structure, add comments denoting sections where all the functions are specifically tied to a piece of the high-level structure. This function groupings include a few sub-divisions of the original high-level structure, including some sub-divisions that are yet to be submitted. Each has (or will have) several functions all serving as helpers to one or two main functions for each section. As an added bonus, the comments will serve to provide a small textual separation between nearby sections and allow the next three patch series to be submitted independently and merge cleanly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	43c1dccb91	merge-ort: add a paths_to_free field to merge_options_internal This field will be used in future patches to allow removal of paths from opt->priv->paths. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	1c7873cdf4	merge-ort: add a path_conflict field to merge_options_internal This field is not yet used, but will be used by both the rename handling code, and the conflict type handling code in process_entry(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:40 -08:00
Elijah Newren	101bc5bc2d	merge-ort: add a clear_internal_opts helper Move most of merge_finalize() into a new helper function, clear_internal_opts(). This is a step to facilitate recursive merges, as well as some future optimizations. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:21:03 -08:00
Elijah Newren	67845745c1	merge-ort: add a few includes Include blob.h for definition of blob_type, and commit-reach.h for declarations of get_merge_bases() and in_merge_bases(). While none of these are used yet, we want to avoid cross-dependencies in the next three series of patches for merge-ort and merge them at the end; adding these "#include"s now avoids textual conflicts. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:21:03 -08:00
Elijah Newren	89422d29b1	merge-ort: free data structures in merge_finalize() Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	ef2b369387	merge-ort: add implementation of record_conflicted_index_entries() After checkout(), the working tree has the appropriate contents, and the index matches the working copy. That means that all unmodified and cleanly merged files have correct index entries, but conflicted entries need to be updated. We do this by looping over the conflicted entries, marking the existing index entry for the path with CE_REMOVE, adding new higher order staged for the path at the end of the index (ignoring normal index sort order), and then at the end of the loop removing the CE_REMOVED-marked cache entries and sorting the index. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	70912f66de	tree: enable cmp_cache_name_compare() to be used elsewhere Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	6681ce5cf6	merge-ort: add implementation of checkout() Since merge-ort creates a tree for its output, when there are no conflicts, updating the working tree and index is as simple as using the unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent of a "checkout" operation). If there were conflicts in the merge, then since the tree we created included all the conflict markers, then using the unpack_trees machinery in this manner will still update the working tree correctly. Further, all index entries corresponding to cleanly merged files will also be updated correctly by this procedure. Index entries corresponding to conflicted entries will appear as though the user had run "git add -u" after the merge to accept all files as-is with conflict markers. Thus, after running unpack_trees(), there needs to be a separate step for updating the entries in the index corresponding to conflicted files. This will be the job for the function record_conflicted_index_entris(), which will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	9fefce68dc	merge-ort: basic outline for merge_switch_to_result() This adds a basic implementation for merge_switch_to_result(), though just in terms of a few new empty functions that will be defined in subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	bb470f4e13	merge-ort: step 3 of tree writing -- handling subdirectories as we go Our order for processing of entries means that if we have a tree of files that looks like Makefile src/moduleA/foo.c src/moduleA/bar.c src/moduleB/baz.c src/moduleB/umm.c tokens.txt Then we will process paths in the order of the leftmost column below. I have added two additional columns that help explain the algorithm that follows; the 2nd column is there to remind us we have oid & mode info we are tracking for each of these paths (which differs between the paths which I'm not representing well here), and the third column annotates the parent directory of the entry: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB src/moduleB <version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA src/moduleA <version_info> src src <version_info> "" Makefile <version_info> "" When the parent directory changes, if it's a subdirectory of the previous parent directory (e.g. "" -> src/moduleB) then we can just keep appending. If the parent directory differs from the previous parent directory and is not a subdirectory, then we should process that directory. So, for example, when we get to this point: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB and note that the next entry (src/moduleB) has a different parent than the last one that isn't a subdirectory, we should write out a tree for it 100644 blob <HASH> umm.c 100644 blob <HASH> baz.c then pop all the entries under that directory while recording the new hash for that directory, leaving us with tokens.txt <version_info> "" src/moduleB <new version_info> src This process repeats until at the end we get to tokens.txt <version_info> "" src <new version_info> "" Makefile <version_info> "" and then we can write out the toplevel tree. Since we potentially have entries in our string_list corresponding to multiple different toplevel directories, e.g. a slightly different repository might have: whizbang.txt <version_info> "" tokens.txt <version_info> "" src/moduleD <new version_info> src src/moduleC <new version_info> src src/moduleB <new version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA When src/moduleA is popped off, we need to know that the "last directory" reverts back to src, and how many entries in our string_list are associated with that parent directory. So I use an auxiliary offsets string_list which would have (parent_directory,offset) information of the form "" 0 src 2 src/moduleA 5 Whenever I write out a tree for a subdirectory, I set versions.nr to the final offset value and then decrement offsets.nr...and then add an entry to versions with a hash for the new directory. The idea is relatively simple, there's just a lot of accounting to implement this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	ee4012dcf9	merge-ort: step 2 of tree writing -- function to create tree object Create a new function, write_tree(), which will take a list of basenames, modes, and oids for a single directory and create a tree object in the object-store. We do not yet have just basenames, modes, and oids for just a single directory (we have a mixture of entries from all directory levels in the hierarchy) so we still die() before the current call to write_tree(), but the next patch will rectify that. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	a9945bba60	merge-ort: step 1 of tree writing -- record basenames, modes, and oids As a step towards transforming the processed path->conflict_info entries into an actual tree object, start recording basenames, modes, and oids in a dir_metadata structure. Subsequent commits will make use of this to actually write a tree. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	8adffaa818	merge-ort: have process_entries operate in a defined order We want to handle paths below a directory before needing to handle the directory itself. Also, we want to handle the directory immediately after the paths below it, so we can't use simple lexicographic ordering from strcmp (which would insert foo.txt between foo and foo/file.c). Copy string_list_df_name_compare() from merge-recursive.c, and set up a string list of paths sorted by that function so that we can iterate in the desired order. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	6a02dd90c9	merge-ort: add a preliminary simple process_entries() implementation Add a process_entries() implementation that just loops over the paths and processes each one individually with an auxiliary process_entry() call. Add a basic process_entry() as well, which handles several cases but leaves a few of the more involved ones with die-not-implemented messages. Also, although process_entries() is supposed to create a tree, it does not yet have code to do so -- except in the special case of merging completely empty trees. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	291f29caf6	merge-ort: avoid recursing into identical trees When all three trees have the same oid, there is no need to recurse into these trees to find that all files within them happen to match. We can just record any one of the trees as the resolution of merging that particular path. Immediately resolving trees for other types of trivial tree merges (such as one side matches the merge base, or the two sides match each other) would prevent us from detecting renames for some paths, and thus prevent us from doing three-way content merges for those paths whose renames we did not detect. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	98bf984167	merge-ort: record stage and auxiliary info for every path Create a helper function, setup_path_info(), which can be used to record all the information we want in a merged_info or conflict_info. While there is currently only one caller of this new function, and some of its particular parameters are fixed, future callers of this function will be added later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	34e557af54	merge-ort: compute a few more useful fields for collect_merge_info Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	885f0063e9	merge-ort: avoid repeating fill_tree_descriptor() on the same tree Three-way merges, by their nature, are going to often have two or more trees match at a given subdirectory. We can avoid calling fill_tree_descriptor() on the same tree by checking when these trees match. Noting when various oids match will also be useful in other calculations and optimizations as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	d2bc1994f3	merge-ort: implement a very basic collect_merge_info() This does not actually collect any necessary info other than the pathnames involved, since it just allocates an all-zero conflict_info and stuffs that into paths. However, it invokes the traverse_trees() machinery to walk over all the paths and sets up the basic infrastructure we need. I have left out a few obvious optimizations to try to make this patch as short and obvious as possible. A subsequent patch will add some of those back in with some more useful data fields before we introduce a patch that actually sets up the conflict_info fields. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	0c0d705b5c	merge-ort: add an err() function similar to one from merge-recursive Various places in merge-recursive used an err() function when it hit some kind of unrecoverable error. That code was from the reusable bits of merge-recursive.c that we liked, such as merge_3way, writing object files to the object store, reading blobs from the object store, etc. So create a similar function to allow us to port that code over, and use it for when we detect problems returned from collect_merge_info()'s traverse_trees() call, which we will be adding next. While we are at it, also add more documentation for the "clean" field from struct merge_result, particularly since the name suggests a boolean but it is not quite one and this is our first non-boolean usage. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	c8017176ac	merge-ort: use histogram diff In my cursory investigation, histogram diffs are about 2% slower than Myers diffs. Others have probably done more detailed benchmarks. But, in short, histogram diffs have been around for years and in a number of cases provide obviously better looking diffs where Myers diffs are unintelligible but the performance hit has kept them from becoming the default. However, there are real merge bugs we know about that have triggered on git.git and linux.git, which I don't have a clue how to address without the additional information that I believe is provided by histogram diffs. See the following: https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/ https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/ I don't like mismerges. I really don't like silent mismerges. While I am sometimes willing to make performance and correctness tradeoff, I'm much more interested in correctness in general. I want to fix the above bugs. I have not yet started doing so, but I believe histogram diff at least gives me an angle. Unfortunately, I can't rely on using the information from histogram diff unless it's in use. And it hasn't been used because of a few percentage performance hit. In testcases I have looked at, merge-ort is _much_ faster than merge-recursive for non-trivial merges/rebases/cherry-picks. As such, this is a golden opportunity to switch out the underlying diff algorithm (at least the one used by the merge machinery; git-diff and git-log are separate questions); doing so will allow me to get additional data and improved diffs, and I believe it will help me fix the above bugs at some point in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	e4171b1b6d	merge-ort: port merge_start() from merge-recursive merge_start() basically does a bunch of sanity checks, then allocates and initializes opt->priv -- a struct merge_options_internal. Most of the sanity checks are usable as-is. The allocation/intialization is a bit different since merge-ort has a very different merge_options_internal than merge-recursive, but the idea is the same. The weirdest part here is that merge-ort and merge-recursive use the same struct merge_options, even though merge_options has a number of fields that are oddly specific to merge-recursive's internal implementation and don't even make sense with merge-ort's high-level design (e.g. buffer_output, which merge-ort has to always do). I reused the same data structure because: * most the fields made sense to both merge algorithms * making a new struct would have required making new enums or somehow externalizing them, and that was getting messy. * it simplifies converting the existing callers by not having to have different code paths for merge_options setup. I also marked detect_renames as ignored. We can revisit that later, but in short: merge-recursive allowed turning off rename detection because it was sometimes glacially slow. When you speed something up by a few orders of magnitude, it's worth revisiting whether that justification is still relevant. Besides, if folks find it's still too slow, perhaps they have a better scaling case than I could find and maybe it turns up some more optimizations we can add. If it still is needed as an option, it is easy to add later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	231e2dd49d	merge-ort: add some high-level algorithm structure merge_ort_nonrecursive_internal() will be used by both merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's focus on it for now. It involves some setup -- merge_start() -- followed by the following chain of functions: collect_merge_info() This function will populate merge_options_internal's paths field, via a call to traverse_trees() and a new callback that will be added later. detect_and_process_renames() This function will detect renames, and then adjust entries in paths to move conflict stages from old pathnames into those for new pathnames, so that the next step doesn't have to think about renames and just can do three-way content merging and such. process_entries() This function determines how to take the various stages (versions of a file from the three different sides) and merge them, and whether to mark the result as conflicted or cleanly merged. It also writes out these merged file versions as it goes to create a tree. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	5b59c3db05	merge-ort: setup basic internal data structures Set up some basic internal data structures. The only carry-over from merge-recursive.c is call_depth, though needed_rename_limit will be added later. The central piece of data will definitely be the strmap "paths", which will map every relevant pathname under consideration to either a merged_info or a conflict_info. ("conflicted" is a strmap that is a subset of "paths".) merged_info contains all relevant information for a non-conflicted entry. conflict_info contains a merged_info, plus any additional information about a conflict such as the higher orders stages involved and the names of the paths those came from (handy once renames get involved). If an entry remains conflicted, the merged_info portion of a conflict_info will later be filled with whatever version of the file should be placed in the working directory (e.g. an as-merged-as-possible variation that contains conflict markers). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
brian m. carlson	fac60b8925	rev-parse: add option for absolute or relative path formatting git rev-parse has several options which print various paths. Some of these paths are printed relative to the current working directory, and some are absolute. Normally, this is not a problem, but there are times when one wants paths entirely in one format or another. This can be done trivially if the paths are canonical, but canonicalizing paths is not possible on some shell scripting environments which lack realpath(1) and also in Go, which lacks functions that properly canonicalize paths on Windows. To help out the scripter, let's provide an option which turns most of the paths printed by git rev-parse to be either relative to the current working directory or absolute and canonical. Document which options are affected and which are not so that users are not confused. This approach is cleaner and tidier than providing duplicates of existing options which are either relative or absolute. Note that if the user needs both forms, it is possible to pass an additional option in the middle of the command line which changes the behavior of subsequent operations. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-12 23:35:51 -08:00
brian m. carlson	be6e0daee7	abspath: add a function to resolve paths with missing components Currently, we have a function to resolve paths, strbuf_realpath. This function canonicalizes paths like realpath(3), but permits a trailing component to be absent from the file system. In other words, this is the behavior of the GNU realpath(1) without any arguments. In the future, we'll need this same behavior, except that we want to allow for any number of missing trailing components, which is the behavior of GNU realpath(1) with the -m option. This is useful because we'll want to canonicalize a path that may point to a not yet present path under the .git directory. For example, a user may want to know where an arbitrary ref would be stored if it existed in the file system. Let's refactor strbuf_realpath to move most of the code to an internal function and then pass it two flags to control its behavior. We'll add a strbuf_realpath_forgiving function that has our new behavior, and leave strbuf_realpath with the older, stricter behavior. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-12 23:35:47 -08:00
Ævar Arnfjörð Bjarmason	058761f1c1	pretty format %(trailers): add a "key_value_separator" Add a "key_value_separator" option to the "%(trailers)" pretty format, to go along with the existing "separator" argument. In combination these two options make it trivial to produce machine-readable (e.g. \0 and \0\0-delimited) format output. As elaborated on in a previous commit which added "keyonly" it was needlessly tedious to extract structured data from "%(trailers)" before the addition of this "key_value_separator" option. As seen by the test being added here extracting this data now becomes trivial. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-09 14:16:42 -08:00
Ævar Arnfjörð Bjarmason	9d87d5ae02	pretty format %(trailers): add a "keyonly" Add support for a "keyonly". This allows for easier parsing out of the key and value. Before if you didn't want to make assumptions about how the key was formatted. You'd need to parse it out as e.g.: --pretty=format:'%H%x00%(trailers:separator=%x00%x00)' \ '%x00%(trailers:separator=%x00%x00,valueonly)' And then proceed to deduce keys by looking at those two and subtracting the value plus the hardcoded ": " separator from the non-valueonly %(trailers) line. Now it's possible to simply do: --pretty=format:'%H%x00%(trailers:separator=%x00%x00,keyonly)' \ '%x00%(trailers:separator=%x00%x00,valueonly)' Which at least reduces it to a state machine where you get N keys and correlate them with N values. Even better would be to have a way to change the ": " delimiter to something easily machine-readable (a key might contain ": " too). A follow-up change will add support for that. I don't really have a use-case for just "keyonly" myself. I suppose it would be useful in some cases as "key=*" matches case-insensitively, so a plain "keyonly" will give you the variants of the keys you matched. I'm mainly adding it to fix the inconsistency with "valueonly". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-09 14:16:42 -08:00
Ævar Arnfjörð Bjarmason	8b966a0506	pretty-format %(trailers): fix broken standalone "valueonly" Fix %(trailers:valueonly) being a noop due to on overly eager optimization in format_trailer_info() which skips custom formatting if no custom options are given. When "valueonly" was added in `d9b936db52` (pretty: add support for "valueonly" option in %(trailers), 2019-01-28) we forgot to add it to the list of options that optimization checks for. See e.g. the addition of "key" in `250bea0c16` (pretty: allow showing specific trailers, 2019-01-28) for a similar change where this wasn't missed. Thus the "valueonly" option in "%(trailers:valueonly)" was a noop and the output was equivalent to that of a plain "%(trailers)". This wasn't caught because the tests for it always combined it with other options. Fix the bug by adding !opts->value_only to the list. I initially attempted to make this more future-proof by setting a flag if we got to ":" in "%(trailers:" in format_commit_one() in pretty.c. However, "%(trailers:" is also parsed in trailers_atom_parser() in ref-filter.c. There is an outstanding patch[1] unify those two, and such a fix, or other future-proofing, such as changing "process_trailer_options" flags into a bitfield, would conflict with that effort. Let's instead do the bare minimum here as this aspect of trailers is being actively worked on by another series. Let's also test for a plain "valueonly" without any other options, as well as "separator". All the other existing options on the pretty.c path had tests where they were the only option provided. I'm also keeping a sanity test for "%(trailers:)" being the same as "%(trailers)". There's no reason to suspect it wouldn't be in the current implementation, but let's keep it in the interest of black box testing. 1. https://lore.kernel.org/git/pull.726.git.1599335291.gitgitgadget@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-09 14:16:42 -08:00
Ævar Arnfjörð Bjarmason	2762e17117	pretty format %(trailers) doc: avoid repetition Change the documentation for the various %(trailers) options so it isn't repeating part of the documentation for "only" about how boolean values are handled. Instead, let's split the description of that into general documentation at the top. It then suffices to refer to it by listing the options as "opt[=<BOOL>]". I'm also changing it to upper-case "[=<BOOL>]" from "[=val]" for consistency with "<SEP>" It took me a couple of readings to realize that these options were referring back to the "only" option's treatment of boolean values. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-09 14:16:42 -08:00
Derrick Stolee	f077b0a986	pack-bitmap-write: better reuse bitmaps If the old bitmap file contains a bitmap for a given commit, then that commit does not need help from intermediate commits in its history to compute its final bitmap. Eject that commit from the walk and insert it into a separate list of reusable commits that are eventually stored in the list of commits for computing bitmaps. This helps the repeat bitmap computation task, even if the selected commits shift drastically. This helps when a previously-bitmapped commit exists in the first-parent history of a newly-selected commit. Since we stop the walk at these commits and we use a first-parent walk, it is harder to walk "around" these bitmapped commits. It's not impossible, but we can greatly reduce the computation time for many selected commits. \| runtime (sec) \| peak heap (GB) \| \| \| \| \| from \| with \| from \| with \| \| scratch \| existing \| scratch \| existing \| -----------+---------+----------+---------+----------- last patch \| 88.478 \| 53.218 \| 2.157 \| 2.224 \| this patch \| 86.681 \| 16.164 \| 2.157 \| 2.222 \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:49:07 -08:00
Derrick Stolee	45f4eeb291	pack-bitmap-write: relax unique revwalk condition The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk before computing bitmaps is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: \| runtime (sec) \| peak heap (GB) \| \| \| \| \| from \| with \| from \| with \| \| scratch \| existing \| scratch \| existing \| -----------+---------+----------+---------+----------- original \| 64.044 \| 83.241 \| 2.088 \| 2.194 \| last patch \| 45.049 \| 37.624 \| 2.267 \| 2.334 \| this patch \| 88.478 \| 53.218 \| 2.157 \| 2.224 \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:49:07 -08:00
Derrick Stolee	341fa34887	pack-bitmap-write: use existing bitmaps When constructing new bitmaps, we perform a commit and tree walk in fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit from using existing bitmaps when available. We must track the existing bitmaps and translate them into the new object order, but this is generally faster than parsing trees. In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees interleaved with the commits, then we might be parsing trees that are actually part of a re-used bitmap. To avoid over-walking trees, add them to a LIFO queue and walk them after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork network, we go from 227s to 198s. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:49:06 -08:00
Taylor Blau	83578051a9	pack-bitmap: factor out 'add_commit_to_bitmap()' 'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:49:06 -08:00
Taylor Blau	98c31f366a	pack-bitmap: factor out 'bitmap_for_commit()' A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:49:04 -08:00
Jeff King	449fa5ee06	pack-bitmap-write: ignore BITMAP_FLAG_REUSE The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Derrick Stolee	089f751360	pack-bitmap-write: build fewer intermediate bitmaps The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 \| \ / \ / \| \| \/ \/ \| \| /\ /\ \| \| / \ / \ \| 110 101 011 \___ \| ___/ \ \| / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J \|\ /\| \| \/ \| \| /\ \| \|/ \\| M N \ / \|/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 \|\ /\| \| \ _____/ \| \| /\____ \| \|/ \ \| M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Taylor Blau	c6b0c3910c	pack-bitmap.c: check reads more aggressively when loading Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should check that it can safely do so by ensuring that there are at least 6 bytes available to be read (four for the commit's index position, and then two more for the xor offset and flags, respectively). Likewise, it should check that the commit index it read refers to a legitimate object in the pack. The first fix catches a truncation bug that was exposed when testing, and the second is purely precautionary. There are some possible future improvements, not pursued here. They are: - Computing the correct boundary of the bitmap itself in the caller and ensuring that we don't read past it. This may or may not be worth it, since in a truncation situation, all bets are off: (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - Avoid the magic "6" by teaching read_be32() and read_u8() (both of which are custom helpers for this function) to check sizes before advancing the pointers. - Adding more tests in this area. Testing these truncation situations are remarkably fragile to even subtle changes in the bitmap generation. So, the resulting tests are likely to be quite brittle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Derrick Stolee	928e3f42ad	pack-bitmap-write: rename children to reverse_edges The bitmap_builder_init() method walks the reachable commits in topological order and constructs a "reverse graph" along the way. At the moment, this reverse graph contains an edge from commit A to commit B if and only if A is a parent of B. Thus, the name "children" is appropriate for for this reverse graph. In the next change, we will repurpose the reverse graph to not be directly-adjacent commits in the commit-graph, but instead a more abstract relationship. The previous changes have already incorporated the necessary updates to fill_bitmap_commit() that allow these edges to not be immediate children. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Derrick Stolee	1467b9572a	t5310: add branch-based checks The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Derrick Stolee	597b2c39af	commit: implement commit_list_contains() It can be helpful to check if a commit_list contains a commit. Use pointer equality, assuming lookup_commit() was used. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Derrick Stolee	ed03a58b65	bitmap: implement bitmap_is_subset() The bitmap_is_subset() function checks if the 'self' bitmap contains any bitmaps that are not on in the 'other' bitmap. Up until this patch, it had a declaration, but no implementation or callers. A subsequent patch will want this function, so implement it here. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Derrick Stolee	6dc5ef759f	pack-bitmap-write: fill bitmap with commit history The current implementation of bitmap_writer_build() creates a reachability bitmap for every walked commit. After computing a bitmap for a commit, those bits are pushed to an in-progress bitmap for its children. fill_bitmap_commit() assumes the bits corresponding to objects reachable from the parents of a commit are already set. This means that when visiting a new commit, we only have to walk the objects reachable between it and any of its parents. A future change to bitmap_writer_build() will relax this condition so not all parents have their bits set. Prepare for that by having 'fill_bitmap_commit()' walk parents until reaching commits whose bits are already set. Then, walk the trees for these commits as well. This has no functional change with the current implementation of bitmap_writer_build(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	010e5eacfb	pack-bitmap-write: pass ownership of intermediate bitmaps Our algorithm to generate reachability bitmaps walks through the commit graph from the bottom up, passing bitmap data from each commit to its descendants. For a linear stretch of history like: A -- B -- C our sequence of steps is: - compute the bitmap for A by walking its trees, etc - duplicate A's bitmap as a starting point for B; we can now free A's bitmap, since we only needed it as an intermediate result - OR in any extra objects that B can reach into its bitmap - duplicate B's bitmap as a starting point for C; likewise, free B's bitmap - OR in objects for C, and so on... Rather than duplicating bitmaps and immediately freeing the original, we can just pass ownership from commit to commit. Note that this doesn't always work: - the recipient may be a merge which already has an intermediate bitmap from its other ancestor. In that case we have to OR our result into it. Note that the first ancestor to reach the merge does get to pass ownership, though. - we may have multiple children; we can only pass ownership to one of them However, it happens often enough and copying bitmaps is expensive enough that this provides a noticeable speedup. On a clone of linux.git, this reduces the time to generate bitmaps from 205s to 70s. This is about the same amount of time it took to generate bitmaps using our old "many traversals" algorithm (the previous commit measures the identical scenario as taking 63s). It unfortunately provides only a very modest reduction in the peak memory usage, though. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	4a9c581729	pack-bitmap-write: reimplement bitmap writing The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	ccae08e822	ewah: add bitmap_dup() function There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	3ed675101a	ewah: implement bitmap_or() We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to `e1273106f6` (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. That was all OK since there were no users of 'bitmap_or()', but a first caller will be added in a couple of patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	2e2d141afd	ewah: make bitmap growth less aggressive If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). A middle ground is relying on alloc_nr(), which causes us to grow by a factor of roughly 3/2 instead of 2. That's less aggressive than doubling, and it may help avoid fragmenting memory. (If we start with N, then grow twice, our total is N(3/2)^2 = 9N/4. After growing twice, that array of size 9N/4 can fit into the space vacated by the original array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less fragmentation in memory). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	d574bf43e8	ewah: factor out bitmap growth We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:16 -08:00
Jeff King	2978b00691	rev-list: die when --test-bitmap detects a mismatch You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:15 -08:00
Jeff King	c5cd749076	t5310: drop size of truncated ewah bitmap We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in `9d2e330b17` (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:15 -08:00
Jeff King	ec6c7b4367	pack-bitmap: bounds-check size of cache extension A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:15 -08:00
Jeff King	ca51090200	pack-bitmap: fix header size check When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of `0f4d6cada8` (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:15 -08:00
Taylor Blau	3b1ca60f8f	ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() 'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer used to store the bits of an EWAH bitmap. It is essentially doing the same task as the 'ALLOC_GROW()' macro, so use that instead. This simplifies the callers of 'buffer_grow()', who no longer have to ask for a specific size, but rather specify how much of the buffer they need. They also no longer need to guard 'buffer_grow()' behind an if statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is a noop if the buffer is already large enough. But, the most significant change is that this fixes a bug when calling buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this case, truncating integer math will leave the new size set to 1, causing the buffer to never grow. Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3 / 2' instead of 'new_size * 3 / 2'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:15 -08:00
Sangeeta Jain	8ef9312464	diff: do not show submodule with untracked files as "-dirty" Git diff reports a submodule directory as -dirty even when there are only untracked files in the submodule directory. This is inconsistent with what `git describe --dirty` says when run in the submodule directory in that state. Make `--ignore-submodules=untracked` the default for `git diff` when there is no configuration variable or command line option, so that the command would not give '-dirty' suffix to a submodule whose working tree has untracked files, to make it consistent with `git describe --dirty` that is run in the submodule working tree. And also make `--ignore-submodules=none` the default for `git status` so that the user doesn't end up deleting a submodule that has uncommitted (untracked) files. Signed-off-by: Sangeeta Jain <sangunb09@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:27:35 -08:00
Ævar Arnfjörð Bjarmason	7c1f79fc16	pretty format %(trailers) test: split a long line Split a very long line in a test introduced in `0b691d8685` (pretty: add support for separator option in %(trailers), 2019-01-28). This makes it easier to read, especially as follow-up commits will copy this test as a template. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-07 10:23:11 -08:00
Derrick Stolee	16c5690929	maintenance: include 'cron' details in docs Advanced and expert users may want to know how 'git maintenance start' schedules background maintenance in order to customize their own schedules beyond what the maintenance.* config values allow. Start a new set of sections in git-maintenance.txt that describe how 'cron' is used to run these tasks. This is particularly valuable for users who want to inspect what Git is doing or for users who want to customize the schedule further. Having a baseline can provide a way forward for users who have never worked with cron schedules. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-24 13:02:29 -08:00
Derrick Stolee	31345d5545	maintenance: extract platform-specific scheduling The existing schedule mechanism using 'cron' is supported by POSIX platforms, but not Windows. It also works slightly differently on macOS to significant detriment of the user experience. To allow for new implementations on these platforms, extract a method that performs the platform-specific scheduling mechanism. This will be swapped at compile time with new implementations on specialized platforms. As we add this generality, rename GIT_TEST_CRONTAB to GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as "<scheduler>:<command>" so we can test platform-specific scheduling logic even when not on the correct platform. By specifying the <scheduler> in this string, we will be able to test all three sets of Git logic from a Linux machine. Co-authored-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-24 13:02:29 -08:00
Johannes Schindelin	8b70966aa9	tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed We introduced the `PREPARE_FOR_MAIN_BRANCH` prereq for the sole purpose of allowing us to perform the non-trivial adjustments regarding the `master` -> `main` rename before the automatable ones. Now that the transition is almost complete, we can stop using it in most instances. The only two exceptions are t5526 and t9902: at the time of writing, there are other patches in flight that touch these test scripts, therefore their transition to `main` is postponed to a later date. This patch is the result of this command: sed -i 's/PREPARE_FOR_MAIN_BRANCH[ ,]//' t/t[0-9].sh && git checkout HEAD -- t/t5526\ t/t9902\* Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	8dcf73c5c9	t99: adjust the references to the default branch name "main" Carefully excluding t9902, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t9903. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t99.sh lib-cvs.sh && git checkout HEAD -- t9902\*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for all tests (except the ones we specifically excluded for now). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	46a29020bb	tests(git-p4): transition to the default branch name `main` In the previous commits, we adjusted the test suite to use the branch name `main` for initial branches. The `git p4`-related tests are a bit harder to adjust because `git p4` uses the ref `refs/heads/p4/master` to track the remote branches, and for now, we do not want to change that (this might be the subject of a future patch series). We only need to adjust for the actual initial branch name to be changed to `main`. This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	765577b5d0	t9[5-7]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t9[5-7].sh lib-cvs.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	a881baa2c3	t9[0-4]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t9[0-4].sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	747f6c6805	t8: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t8.sh annotate*.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	1e2ae142c0	t7[5-9]: adjust the references to the default branch name "main" Excluding t7817, which is added in an unrelated patch series at the time of writing, this adjusts t7[5-9]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t7[5-9]*.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	01dc81336d	t7[0-4]: adjust the references to the default branch name "main" Carefully excluding t7064, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t7[0-4]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t7[0-4].sh && git checkout HEAD -- t7064\) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	5902f5f460	t6[4-9]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t6[4-9].sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	1f53df54eb	t64*: preemptively adjust alignment to prepare for `master` -> `main` We are in the process of renaming the default branch name to `main`, which is two characters shorter than `master`. Therefore, some lines need to be adjusted in t6416, t6422 and t6427 that want to align text involving the default branch name. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	1550bb6ed0	t6[0-3]: adjust the references to the default branch name "main" Carefully excluding t6300, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t6[0-3]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t6[0-3].sh && git checkout HEAD -- t6300\) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	95cf2c0187	t5[6-9]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t5[6-9].sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	028cb644ec	t55[4-9]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -e 's/retsam/niam/g' \ -- t55[4-9].sh t556x*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Note that t5541 uses the reversed `master` name: `retsam`. We replace it by the equivalent for `main`: `niam`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	3ac8f6301e	t55[23]: adjust the references to the default branch name "main" Carefully excluding t5526, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t55[23]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -e 's/naster/nain/g' -- \ t55[23].sh && git checkout HEAD -- t5526\) Note that t5533 contains a variation of the name `master` (`naster`) that we rename here, too. This commit allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for that range of tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	bc925ce3f3	t551: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t551.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	3275f4e886	t550: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t550.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	e4010de9f0	t5503: prepare aligned comment for replacing `master` with `main` In an upcoming commit, we will use `main` as the default branch name in t5503 instead of `master`. This will require extra padding in ASCII-art commit graphs, which we hereby add preemptively. By doing this preemptively rather than after the commit applying the search-and-replace, it is more obvious that we caught all aligned comments that are affected by the latter commit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	966b4be276	t5[0-4]: adjust the references to the default branch name "main" Carefully excluding t5310, which is developed independently of the current patch series at the time of writing, we now use `main` as default branch in t5[0-4]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t5[0-4].sh && git checkout HEAD -- t5310\) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	4b071211e6	t5323: prepare centered comment for `master` -> `main` We are about to search-and-replace all mentions of `master` in t5323 by `main`, which is two characters shorter. To prepare for that, let's add padding to centered lines that will make them briefly uncentered, but will be re-centered in the commit that performs that rename. Doing it this way (instead of padding after replacing) makes it easier to verify the validity of the patch that replaces `master` by `main`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	8f37854b18	t4: adjust the references to the default branch name "main" Carefully excluding t4013 and t4015, which see independent development elsewhere at the time of writing, we use `main` as the default branch name in t4. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t4.sh t4211/.export && git checkout HEAD -- t4013\*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	cbc75a12f0	t3[5-9]: adjust the references to the default branch name "main" This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t3[5-9].sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	d1c02d93b3	t34: adjust the references to the default branch name "main" Carefully excluding t3404, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t34. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t34.sh && git checkout HEAD -- t34\) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	ba766eebee	t3416: preemptively adjust alignment in a comment We are about to adjust t3416 for the new default branch name `main`. This name is two characters shorter and therefore needs two spaces more padding to align correctly. Adjusting the alignment before the big search-and-replace makes it easier to verify that the final result does not leave any misaligned lines behind. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	d6c6b10817	t3[0-3]: adjust the references to the default branch name "main" Carefully excluding t3040, which sees independent development elsewhere at the time of writing, we transition above-mentioned tests to the default branch name `main`. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t3[0-3].sh t3206/* && git checkout HEAD -- t3040\*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	883b98efad	t2: adjust the references to the default branch name "main" Carefully excluding t2106, which sees independent development elsewhere at the time of writing, we transition above-mentioned tests to the default branch name `main`. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t2.sh && git checkout HEAD -- t2106\*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	06d531486e	t[01]: adjust the references to the default branch name "main" Carefully excluding t1309, which sees independent development elsewhere at the time of writing, we transition above-mentioned tests to the default branch name `main`. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -e 's/naster/nain/g' -- t[01].sh && git checkout HEAD -- t1309\*) Note that t5533 contains a variation of the name `master` (`naster`) that we rename here, too. This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	c2fdc8820c	t0060: preemptively adjust alignment We are about to adjust t0060 for the new default branch name `main`. This name is two characters shorter and therefore needs two spaces more padding to align correctly. Adjusting the alignment before the big search-and-replace makes it easier to verify that the final result does not leave any misaligned lines behind. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:17 -08:00
Johannes Schindelin	334afbc76f	tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ \. \.\/$test-lib\\|lib-\(bash\\|cvs\\|git-svn$\\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9].sh) \ t/t4211.sh t/t5560.sh t/t8002.sh t/t8012.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ \. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167].sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:17 -08:00
Junio C Hamano	fced6d171e	Merge 'jk/diff-release-filespec-fix' into js/default-branch-name-tests-final-stretch * jk/diff-release-filespec-fix: t7800: simplify difftool test diff: allow passing NULL to diff_free_filespec_data()	2020-11-19 15:27:59 -08:00
Junio C Hamano	6d37ca2165	Merge branch 'en/strmap' into en/merge-ort-impl * en/strmap: shortlog: use strset from strmap.h Use new HASHMAP_INIT macro to simplify hashmap initialization strmap: take advantage of FLEXPTR_ALLOC_STR when relevant strmap: enable allocations to come from a mem_pool strmap: add a strset sub-type strmap: split create_entry() out of strmap_put() strmap: add functions facilitating use as a string->int map strmap: enable faster clearing and reusing of strmaps strmap: add more utility functions strmap: new utility functions hashmap: provide deallocation function names hashmap: introduce a new hashmap_partial_clear() hashmap: allow re-use after hashmap_free() hashmap: adjust spacing to fix argument alignment hashmap: add usage documentation explaining hashmap_free[_entries]()	2020-11-11 12:56:29 -08:00

846 changed files with 100812 additions and 54718 deletions

1

.gitattributes vendored

View File

 @ -6,6 +6,7 @@
 *.pm eol=lf diff=perl
 *.py eol=lf diff=python
 *.bat eol=crlf
 CODE_OF_CONDUCT.md -whitespace
 /Documentation/**/*.txt eol=lf
 /command-list.txt eol=lf
 /GIT-VERSION-GEN eol=lf

									
										2

.github/workflows/main.yml
									
										vendored
									
												View File
												
				@ -289,7 +289,7 @@ jobs:

				          - jobname: osx-gcc

				            cc: gcc

				            pool: macos-latest

				          - jobname: GETTEXT_POISON

				          - jobname: linux-gcc-default

				            cc: gcc

				            pool: ubuntu-latest

				    env:

									
										2

.travis.yml
									
												View File
												
				@ -16,7 +16,7 @@ compiler:

				matrix:

				  include:

				    - env: jobname=GETTEXT_POISON

				    - env: jobname=linux-gcc-default

				      os: linux

				      compiler:

				      addons:

									
										154

CODE_OF_CONDUCT.md
									
												View File
												
				@ -8,73 +8,64 @@ this code of conduct may be banned from the community.

				## Our Pledge

				In the interest of fostering an open and welcoming environment, we as

				contributors and maintainers pledge to make participation in our project and

				our community a harassment-free experience for everyone, regardless of age,

				body size, disability, ethnicity, sex characteristics, gender identity and

				expression, level of experience, education, socio-economic status,

				nationality, personal appearance, race, religion, or sexual identity and

				orientation.

				We as members, contributors, and leaders pledge to make participation in our

				community a harassment-free experience for everyone, regardless of age, body

				size, visible or invisible disability, ethnicity, sex characteristics, gender

				identity and expression, level of experience, education, socio-economic status,

				nationality, personal appearance, race, religion, or sexual identity

				and orientation.

				We pledge to act and interact in ways that contribute to an open, welcoming,

				diverse, inclusive, and healthy community.

				## Our Standards

				Examples of behavior that contributes to creating a positive environment

				include:

				Examples of behavior that contributes to a positive environment for our

				community include:

				* Using welcoming and inclusive language

				* Being respectful of differing viewpoints and experiences

				* Gracefully accepting constructive criticism

				* Focusing on what is best for the community

				* Showing empathy towards other community members

				* Demonstrating empathy and kindness toward other people

				* Being respectful of differing opinions, viewpoints, and experiences

				* Giving and gracefully accepting constructive feedback

				* Accepting responsibility and apologizing to those affected by our mistakes,

				  and learning from the experience

				* Focusing on what is best not just for us as individuals, but for the

				  overall community

				Examples of unacceptable behavior by participants include:

				Examples of unacceptable behavior include:

				* The use of sexualized language or imagery and unwelcome sexual attention or

				  advances

				* Trolling, insulting/derogatory comments, and personal or political attacks

				* The use of sexualized language or imagery, and sexual attention or

				  advances of any kind

				* Trolling, insulting or derogatory comments, and personal or political attacks

				* Public or private harassment

				* Publishing others' private information, such as a physical or electronic

				  address, without explicit permission

				* Publishing others' private information, such as a physical or email

				  address, without their explicit permission

				* Other conduct which could reasonably be considered inappropriate in a

				  professional setting

				## Our Responsibilities

				## Enforcement Responsibilities

				Project maintainers are responsible for clarifying the standards of acceptable

				behavior and are expected to take appropriate and fair corrective action in

				response to any instances of unacceptable behavior.

				Community leaders are responsible for clarifying and enforcing our standards of

				acceptable behavior and will take appropriate and fair corrective action in

				response to any behavior that they deem inappropriate, threatening, offensive,

				or harmful.

				Project maintainers have the right and responsibility to remove, edit, or

				reject comments, commits, code, wiki edits, issues, and other contributions

				that are not aligned to this Code of Conduct, or to ban temporarily or

				permanently any contributor for other behaviors that they deem inappropriate,

				threatening, offensive, or harmful.

				Community leaders have the right and responsibility to remove, edit, or reject

				comments, commits, code, wiki edits, issues, and other contributions that are

				not aligned to this Code of Conduct, and will communicate reasons for moderation

				decisions when appropriate.

				## Scope

				This Code of Conduct applies within all project spaces, and it also applies

				when an individual is representing the project or its community in public

				spaces. Examples of representing a project or community include using an

				official project e-mail address, posting via an official social media account,

				or acting as an appointed representative at an online or offline event.

				Representation of a project may be further defined and clarified by project

				maintainers.

				This Code of Conduct applies within all community spaces, and also applies when

				an individual is officially representing the community in public spaces.

				Examples of representing our community include using an official e-mail address,

				posting via an official social media account, or acting as an appointed

				representative at an online or offline event.

				## Enforcement

				Instances of abusive, harassing, or otherwise unacceptable behavior may be

				reported by contacting the project team at git@sfconservancy.org. All

				complaints will be reviewed and investigated and will result in a response

				that is deemed necessary and appropriate to the circumstances. The project

				team is obligated to maintain confidentiality with regard to the reporter of

				an incident. Further details of specific enforcement policies may be posted

				separately.

				Project maintainers who do not follow or enforce the Code of Conduct in good

				faith may face temporary or permanent repercussions as determined by other

				members of the project's leadership.

				The project leadership team can be contacted by email as a whole at

				reported to the community leaders responsible for enforcement at

				git@sfconservancy.org, or individually:

				  - Ævar Arnfjörð Bjarmason <avarab@gmail.com>

				@ -82,12 +73,73 @@ git@sfconservancy.org, or individually:

				  - Jeff King <peff@peff.net>

				  - Junio C Hamano <gitster@pobox.com>

				All complaints will be reviewed and investigated promptly and fairly.

				All community leaders are obligated to respect the privacy and security of the

				reporter of any incident.

				## Enforcement Guidelines

				Community leaders will follow these Community Impact Guidelines in determining

				the consequences for any action they deem in violation of this Code of Conduct:

				### 1. Correction

				**Community Impact**: Use of inappropriate language or other behavior deemed

				unprofessional or unwelcome in the community.

				**Consequence**: A private, written warning from community leaders, providing

				clarity around the nature of the violation and an explanation of why the

				behavior was inappropriate. A public apology may be requested.

				### 2. Warning

				**Community Impact**: A violation through a single incident or series

				of actions.

				**Consequence**: A warning with consequences for continued behavior. No

				interaction with the people involved, including unsolicited interaction with

				those enforcing the Code of Conduct, for a specified period of time. This

				includes avoiding interactions in community spaces as well as external channels

				like social media. Violating these terms may lead to a temporary or

				permanent ban.

				### 3. Temporary Ban

				**Community Impact**: A serious violation of community standards, including

				sustained inappropriate behavior.

				**Consequence**: A temporary ban from any sort of interaction or public

				communication with the community for a specified period of time. No public or

				private interaction with the people involved, including unsolicited interaction

				with those enforcing the Code of Conduct, is allowed during this period.

				Violating these terms may lead to a permanent ban.

				### 4. Permanent Ban

				**Community Impact**: Demonstrating a pattern of violation of community

				standards, including sustained inappropriate behavior,  harassment of an

				individual, or aggression toward or disparagement of classes of individuals.

				**Consequence**: A permanent ban from any sort of public interaction within

				the community.

				## Attribution

				This Code of Conduct is adapted from the [Contributor Covenant][homepage],

				version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

				version 2.0, available at

				[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].

				Community Impact Guidelines were inspired by 

				[Mozilla's code of conduct enforcement ladder][Mozilla CoC].

				For answers to common questions about this code of conduct, see the FAQ at

				[https://www.contributor-covenant.org/faq][FAQ]. Translations are available 

				at [https://www.contributor-covenant.org/translations][translations].

				[homepage]: https://www.contributor-covenant.org

				[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html

				[Mozilla CoC]: https://github.com/mozilla/diversity

				[FAQ]: https://www.contributor-covenant.org/faq

				[translations]: https://www.contributor-covenant.org/translations

				For answers to common questions about this code of conduct, see

				https://www.contributor-covenant.org/faq

									
										1

Documentation/Makefile
									
												View File
												
				@ -21,6 +21,7 @@ MAN1_TXT += gitweb.txt

				MAN5_TXT += gitattributes.txt

				MAN5_TXT += githooks.txt

				MAN5_TXT += gitignore.txt

				MAN5_TXT += gitmailmap.txt

				MAN5_TXT += gitmodules.txt

				MAN5_TXT += gitrepository-layout.txt

				MAN5_TXT += gitweb.conf.txt

2

Documentation/MyFirstContribution.txt

View File

 @ -664,7 +664,7 @@ mention the right animal somewhere:
 ----
 test_expect_success 'runs correctly with no args and good output' '
 	git psuh >actual &&
 	test_i18ngrep Pony actual
 	grep Pony actual
 '
 ----

86

Documentation/RelNotes/2.30.7.txt Normal file

View File

 @ -0,0 +1,86 @@
 Git v2.30.7 Release Notes
 =========================
 This release addresses the security issues CVE-2022-41903 and
 CVE-2022-23521.
 Fixes since v2.30.6
 -------------------
  * CVE-2022-41903:
    git log has the ability to display commits using an arbitrary
    format with its --format specifiers. This functionality is also
    exposed to git archive via the export-subst gitattribute.
    When processing the padding operators (e.g., %<(, %<|(, %>(,
    %>>(, or %><( ), an integer overflow can occur in
    pretty.c::format_and_pad_commit() where a size_t is improperly
    stored as an int, and then added as an offset to a subsequent
    memcpy() call.
    This overflow can be triggered directly by a user running a
    command which invokes the commit formatting machinery (e.g., git
    log --format=...). It may also be triggered indirectly through
    git archive via the export-subst mechanism, which expands format
    specifiers inside of files within the repository during a git
    archive.
    This integer overflow can result in arbitrary heap writes, which
    may result in remote code execution.
 * CVE-2022-23521:
     gitattributes are a mechanism to allow defining attributes for
     paths. These attributes can be defined by adding a `.gitattributes`
     file to the repository, which contains a set of file patterns and
     the attributes that should be set for paths matching this pattern.
     When parsing gitattributes, multiple integer overflows can occur
     when there is a huge number of path patterns, a huge number of
     attributes for a single pattern, or when the declared attribute
     names are huge.
     These overflows can be triggered via a crafted `.gitattributes` file
     that may be part of the commit history. Git silently splits lines
     longer than 2KB when parsing gitattributes from a file, but not when
     parsing them from the index. Consequentially, the failure mode
     depends on whether the file exists in the working tree, the index or
     both.
     This integer overflow can result in arbitrary heap reads and writes,
     which may result in remote code execution.
 Credit for finding CVE-2022-41903 goes to Joern Schneeweisz of GitLab.
 An initial fix was authored by Markus Vervier of X41 D-Sec. Credit for
 finding CVE-2022-23521 goes to Markus Vervier and Eric Sesterhenn of X41
 D-Sec. This work was sponsored by OSTIF.
 The proposed fixes have been polished and extended to cover additional
 findings by Patrick Steinhardt of GitLab, with help from others on the
 Git security mailing list.
 Patrick Steinhardt (21):
       attr: fix overflow when upserting attribute with overly long name
       attr: fix out-of-bounds read with huge attribute names
       attr: fix integer overflow when parsing huge attribute names
       attr: fix out-of-bounds write when parsing huge number of attributes
       attr: fix out-of-bounds read with unreasonable amount of patterns
       attr: fix integer overflow with more than INT_MAX macros
       attr: harden allocation against integer overflows
       attr: fix silently splitting up lines longer than 2048 bytes
       attr: ignore attribute lines exceeding 2048 bytes
       attr: ignore overly large gitattributes files
       pretty: fix out-of-bounds write caused by integer overflow
       pretty: fix out-of-bounds read when left-flushing with stealing
       pretty: fix out-of-bounds read when parsing invalid padding format
       pretty: fix adding linefeed when placeholder is not expanded
       pretty: fix integer overflow in wrapping format
       utf8: fix truncated string lengths in `utf8_strnwidth()`
       utf8: fix returning negative string width
       utf8: fix overflow when returning string width
       utf8: fix checking for glyph width in `strbuf_utf8_replace()`
       utf8: refactor `strbuf_utf8_replace` to not rely on preallocated buffer
       pretty: restrict input lengths for padding and wrapping formats

52

Documentation/RelNotes/2.30.8.txt Normal file

View File

 @ -0,0 +1,52 @@
 Git v2.30.8 Release Notes
 =========================
 This release addresses the security issues CVE-2023-22490 and
 CVE-2023-23946.
 Fixes since v2.30.7
 -------------------
  * CVE-2023-22490:
    Using a specially-crafted repository, Git can be tricked into using
    its local clone optimization even when using a non-local transport.
    Though Git will abort local clones whose source $GIT_DIR/objects
    directory contains symbolic links (c.f., CVE-2022-39253), the objects
    directory itself may still be a symbolic link.
    These two may be combined to include arbitrary files based on known
    paths on the victim's filesystem within the malicious repository's
    working copy, allowing for data exfiltration in a similar manner as
    CVE-2022-39253.
  * CVE-2023-23946:
    By feeding a crafted input to "git apply", a path outside the
    working tree can be overwritten as the user who is running "git
    apply".
  * A mismatched type in `attr.c::read_attr_from_index()` which could
    cause Git to errantly reject attributes on Windows and 32-bit Linux
    has been corrected.
 Credit for finding CVE-2023-22490 goes to yvvdwf, and the fix was
 developed by Taylor Blau, with additional help from others on the
 Git security mailing list.
 Credit for finding CVE-2023-23946 goes to Joern Schneeweisz, and the
 fix was developed by Patrick Steinhardt.
 Johannes Schindelin (1):
       attr: adjust a mismatched data type
 Patrick Steinhardt (1):
       apply: fix writing behind newly created symbolic links
 Taylor Blau (3):
       t5619: demonstrate clone_local() with ambiguous transport
       clone: delay picking a transport until after get_repo_path()
       dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS

365

Documentation/RelNotes/2.31.0.txt Normal file

View File

 @ -0,0 +1,365 @@
 Git 2.31 Release Notes
 ======================
 Updates since v2.30
 -------------------
 Backward incompatible and other important changes
  * The "pack-redundant" command, which has been left stale with almost
    unusable performance issues, now warns loudly when it gets used, as
    we no longer want to recommend its use (instead just "repack -d"
    instead).
  * The development community has adopted Contributor Covenant v2.0 to
    update from v1.4 that we have been using.
  * The support for deprecated PCRE1 library has been dropped.
  * Fixes for CVE-2021-21300 in Git 2.30.2 (and earlier) is included.
 UI, Workflows & Features
  * The "--format=%(trailers)" mechanism gets enhanced to make it
    easier to design output for machine consumption.
  * When a user does not tell "git pull" to use rebase or merge, the
    command gives a loud message telling a user to choose between
    rebase or merge but creates a merge anyway, forcing users who would
    want to rebase to redo the operation.  Fix an early part of this
    problem by tightening the condition to give the message---there is
    no reason to stop or force the user to choose between rebase or
    merge if the history fast-forwards.
  * The configuration variable 'core.abbrev' can be set to 'no' to
    force no abbreviation regardless of the hash algorithm.
  * "git rev-parse" can be explicitly told to give output as absolute
    or relative path with the `--path-format=(absolute|relative)` option.
  * Bash completion (in contrib/) update to make it easier for
    end-users to add completion for their custom "git" subcommands.
  * "git maintenance" learned to drive scheduled maintenance on
    platforms whose native scheduling methods are not 'cron'.
  * After expiring a reflog and making a single commit, the reflog for
    the branch would record a single entry that knows both @{0} and
    @{1}, but we failed to answer "what commit were we on?", i.e. @{1}
  * "git bundle" learns "--stdin" option to read its refs from the
    standard input.  Also, it now does not lose refs whey they point
    at the same object.
  * "git log" learned a new "--diff-merges=<how>" option.
  * "git ls-files" can and does show multiple entries when the index is
    unmerged, which is a source for confusion unless -s/-u option is in
    use.  A new option --deduplicate has been introduced.
  * `git worktree list` now annotates worktrees as prunable, shows
    locked and prunable attributes in --porcelain mode, and gained
    a --verbose option.
  * "git clone" tries to locally check out the branch pointed at by
    HEAD of the remote repository after it is done, but the protocol
    did not convey the information necessary to do so when copying an
    empty repository.  The protocol v2 learned how to do so.
  * There are other ways than ".." for a single token to denote a
    "commit range", namely "<rev>^!" and "<rev>^-<n>", but "git
    range-diff" did not understand them.
  * The "git range-diff" command learned "--(left|right)-only" option
    to show only one side of the compared range.
  * "git mergetool" feeds three versions (base, local and remote) of
    a conflicted path unmodified.  The command learned to optionally
    prepare these files with unconflicted parts already resolved.
  * The .mailmap is documented to be read only from the root level of a
    working tree, but a stray file in a bare repository also was read
    by accident, which has been corrected.
  * "git maintenance" tool learned a new "pack-refs" maintenance task.
  * The error message given when a configuration variable that is
    expected to have a boolean value has been improved.
  * Signed commits and tags now allow verification of objects, whose
    two object names (one in SHA-1, the other in SHA-256) are both
    signed.
  * "git rev-list" command learned "--disk-usage" option.
  * "git {diff,log} --{skip,rotate}-to=<path>" allows the user to
    discard diff output for early paths or move them to the end of the
    output.
  * "git difftool" learned "--skip-to=<path>" option to restart an
    interrupted session from an arbitrary path.
  * "git grep" has been tweaked to be limited to the sparse checkout
    paths.
  * "git rebase --[no-]fork-point" gained a configuration variable
    rebase.forkPoint so that users do not have to keep specifying a
    non-default setting.
 Performance, Internal Implementation, Development Support etc.
  * A 3-year old test that was not testing anything useful has been
    corrected.
  * Retire more names with "sha1" in it.
  * The topological walk codepath is covered by new trace2 stats.
  * Update the Code-of-conduct to version 2.0 from the upstream (we've
    been using version 1.4).
  * "git mktag" validates its input using its own rules before writing
    a tag object---it has been updated to share the logic with "git
    fsck".
  * Two new ways to feed configuration variable-value pairs via
    environment variables have been introduced, and the way
    GIT_CONFIG_PARAMETERS encodes variable/value pairs has been tweaked
    to make it more robust.
  * Tests have been updated so that they do not to get affected by the
    name of the default branch "git init" creates.
  * "git fetch" learns to treat ref updates atomically in all-or-none
    fashion, just like "git push" does, with the new "--atomic" option.
  * The peel_ref() API has been replaced with peel_iterated_oid().
  * The .use_shell flag in struct child_process that is passed to
    run_command() API has been clarified with a bit more documentation.
  * Document, clean-up and optimize the code around the cache-tree
    extension in the index.
  * The ls-refs protocol operation has been optimized to narrow the
    sub-hierarchy of refs/ it walks to produce response.
  * When removing many branches and tags, the code used to do so one
    ref at a time.  There is another API it can use to delete multiple
    refs, and it makes quite a lot of performance difference when the
    refs are packed.
  * The "pack-objects" command needs to iterate over all the tags when
    automatic tag following is enabled, but it actually iterated over
    all refs and then discarded everything outside "refs/tags/"
    hierarchy, which was quite wasteful.
  * A perf script was made more portable.
  * Our setting of GitHub CI test jobs were a bit too eager to give up
    once there is even one failure found.  Tweak the knob to allow
    other jobs keep running even when we see a failure, so that we can
    find more failures in a single run.
  * We've carried compatibility codepaths for compilers without
    variadic macros for quite some time, but the world may be ready for
    them to be removed.  Force compilation failure on exotic platforms
    where variadic macros are not available to find out who screams in
    such a way that we can easily revert if it turns out that the world
    is not yet ready.
  * Code clean-up to ensure our use of hashtables using object names as
    keys use the "struct object_id" objects, not the raw hash values.
  * Lose the debugging aid that may have been useful in the past, but
    no longer is, in the "grep" codepaths.
  * Some pretty-format specifiers do not need the data in commit object
    (e.g. "%H"), but we were over-eager to load and parse it, which has
    been made even lazier.
  * Get rid of "GETTEXT_POISON" support altogether, which may or may
    not be controversial.
  * Introduce an on-disk file to record revindex for packdata, which
    traditionally was always created on the fly and only in-core.
  * The commit-graph learned to use corrected commit dates instead of
    the generation number to help topological revision traversal.
  * Piecemeal of rewrite of "git bisect" in C continues.
  * When a pager spawned by us exited, the trace log did not record its
    exit status correctly, which has been corrected.
  * Removal of GIT_TEST_GETTEXT_POISON continues.
  * The code to implement "git merge-base --independent" was poorly
    done and was kept from the very beginning of the feature.
  * Preliminary changes to fsmonitor integration.
  * Performance improvements for rename detection.
  * The common code to deal with "chunked file format" that is shared
    by the multi-pack-index and commit-graph files have been factored
    out, to help codepaths for both filetypes to become more robust.
  * The approach to "fsck" the incoming objects in "index-pack" is
    attractive for performance reasons (we have them already in core,
    inflated and ready to be inspected), but fundamentally cannot be
    applied fully when we receive more than one pack stream, as a tree
    object in one pack may refer to a blob object in another pack as
    ".gitmodules", when we want to inspect blobs that are used as
    ".gitmodules" file, for example.  Teach "index-pack" to emit
    objects that must be inspected later and check them in the calling
    "fetch-pack" process.
  * The logic to handle "trailer" related placeholders in the
    "--format=" mechanisms in the "log" family and "for-each-ref"
    family is getting unified.
  * Raise the buffer size used when writing the index file out from
    (obviously too small) 8kB to (clearly sufficiently large) 128kB.
  * It is reported that open() on some platforms (e.g. macOS Big Sur)
    can return EINTR even though our timers are set up with SA_RESTART.
    A workaround has been implemented and enabled for macOS to rerun
    open() transparently from the caller when this happens.
 Fixes since v2.30
 -----------------
  * Diagnose command line error of "git rebase" early.
  * Clean up option descriptions in "git cmd --help".
  * "git stash" did not work well in a sparsely checked out working
    tree.
  * Some tests expect that "ls -l" output has either '-' or 'x' for
    group executable bit, but setgid bit can be inherited from parent
    directory and make these fields 'S' or 's' instead, causing test
    failures.
  * "git for-each-repo --config=<var> <cmd>" should not run <cmd> for
    any repository when the configuration variable <var> is not defined
    even once.
  * Fix 2.29 regression where "git mergetool --tool-help" fails to list
    all the available tools.
  * Fix for procedure to building CI test environment for mac.
  * The implementation of "git branch --sort" wrt the detached HEAD
    display has always been hacky, which has been cleaned up.
  * Newline characters in the host and path part of git:// URL are
    now forbidden.
  * "git diff" showed a submodule working tree with untracked cruft as
    "Submodule commit <objectname>-dirty", but a natural expectation is
    that the "-dirty" indicator would align with "git describe --dirty",
    which does not consider having untracked files in the working tree
    as source of dirtiness.  The inconsistency has been fixed.
  * When more than one commit with the same patch ID appears on one
    side, "git log --cherry-pick A...B" did not exclude them all when a
    commit with the same patch ID appears on the other side.  Now it
    does.
  * Documentation for "git fsck" lost stale bits that has become
    incorrect.
  * Doc fix for packfile URI feature.
  * When "git rebase -i" processes "fixup" insn, there is no reason to
    clean up the commit log message, but we did the usual stripspace
    processing.  This has been corrected.
    (merge f7d42ceec5 js/rebase-i-commit-cleanup-fix later to maint).
  * Fix in passing custom args from "git clone" to "upload-pack" on the
    other side.
    (merge ad6b5fefbd jv/upload-pack-filter-spec-quotefix later to maint).
  * The command line completion (in contrib/) completed "git branch -d"
    with branch names, but "git branch -D" offered tagnames in addition,
    which has been corrected.  "git branch -M" had the same problem.
    (merge 27dc071b9a jk/complete-branch-force-delete later to maint).
  * When commands are started from a subdirectory, they may have to
    compare the path to the subdirectory (called prefix and found out
    from $(pwd)) with the tracked paths.  On macOS, $(pwd) and
    readdir() yield decomposed path, while the tracked paths are
    usually normalized to the precomposed form, causing mismatch.  This
    has been fixed by taking the same approach used to normalize the
    command line arguments.
    (merge 5c327502db tb/precompose-prefix-too later to maint).
  * Even though invocations of "die()" were logged to the trace2
    system, "BUG()"s were not, which has been corrected.
    (merge 0a9dde4a04 jt/trace2-BUG later to maint).
  * "git grep --untracked" is meant to be "let's ALSO find in these
    files on the filesystem" when looking for matches in the working
    tree files, and does not make any sense if the primary search is
    done against the index, or the tree objects.  The "--cached" and
    "--untracked" options have been marked as mutually incompatible.
    (merge 0c5d83b248 mt/grep-cached-untracked later to maint).
  * Fix "git fsck --name-objects" which apparently has not been used by
    anybody who is motivated enough to report breakage.
    (merge e89f89361c js/fsck-name-objects-fix later to maint).
  * Avoid individual tests in t5411 from getting affected by each other
    by forcing them to use separate output files during the test.
    (merge 822ee894f6 jx/t5411-unique-filenames later to maint).
  * Test to make sure "git rev-parse one-thing one-thing" gives
    the same thing twice (when one-thing is --since=X).
    (merge a5cdca4520 ew/rev-parse-since-test later to maint).
  * When certain features (e.g. grafts) used in the repository are
    incompatible with the use of the commit-graph, we used to silently
    turned commit-graph off; we now tell the user what we are doing.
    (merge c85eec7fc3 js/commit-graph-warning later to maint).
  * Objects that lost references can be pruned away, even when they
    have notes attached to it (and these notes will become dangling,
    which in turn can be pruned with "git notes prune").  This has been
    clarified in the documentation.
    (merge fa9ab027ba mz/doc-notes-are-not-anchors later to maint).
  * The error codepath around the "--temp/--prefix" feature of "git
    checkout-index" has been improved.
    (merge 3f7ba60350 mt/checkout-index-corner-cases later to maint).
  * The "git maintenance register" command had trouble registering bare
    repositories, which had been corrected.
  * A handful of multi-word configuration variable names in
    documentation that are spelled in all lowercase have been corrected
    to use the more canonical camelCase.
    (merge 7dd0eaa39c dl/doc-config-camelcase later to maint).
  * "git push $there --delete ''" should have been diagnosed as an
    error, but instead turned into a matching push, which has been
    corrected.
    (merge 20e416409f jc/push-delete-nothing later to maint).
  * Test script modernization.
    (merge 488acf15df sv/t7001-modernize later to maint).
  * An under-allocation for the untracked cache data has been corrected.
    (merge 6347d649bc jh/untracked-cache-fix later to maint).
  * Other code cleanup, docfix, build fix, etc.
    (merge e3f5da7e60 sg/t7800-difftool-robustify later to maint).
    (merge 9d336655ba js/doc-proto-v2-response-end later to maint).
    (merge 1b5b8cf072 jc/maint-column-doc-typofix later to maint).
    (merge 3a837b58e3 cw/pack-config-doc later to maint).
    (merge 01168a9d89 ug/doc-commit-approxidate later to maint).
    (merge b865734760 js/params-vs-args later to maint).

27

Documentation/RelNotes/2.31.1.txt Normal file

View File

 @ -0,0 +1,27 @@
 Git 2.31.1 Release Notes
 ========================
 Fixes since v2.31
 -----------------
  * The fsmonitor interface read from its input without making sure
    there is something to read from.  This bug is new in 2.31
    timeframe.
  * The data structure used by fsmonitor interface was not properly
    duplicated during an in-core merge, leading to use-after-free etc.
  * "git bisect" reimplemented more in C during 2.30 timeframe did not
    take an annotated tag as a good/bad endpoint well.  This regression
    has been corrected.
  * Fix macros that can silently inject unintended null-statements.
  * CALLOC_ARRAY() macro replaces many uses of xcalloc().
  * Update insn in Makefile comments to run fuzz-all target.
  * Fix a corner case bug in "git mv" on case insensitive systems,
    which was introduced in 2.29 timeframe.
 Also contains various documentation updates and code clean-ups.

6

Documentation/RelNotes/2.31.2.txt Normal file

View File

 @ -0,0 +1,6 @@
 Git v2.31.2 Release Notes
 =========================
 This release merges up the fixes that appear in v2.30.3 to address
 the security issue CVE-2022-24765; see the release notes for that
 version for details.

4

Documentation/RelNotes/2.31.3.txt Normal file

View File

 @ -0,0 +1,4 @@
 Git Documentation/RelNotes/2.31.3.txt Release Notes
 =========================
 This release merges up the fixes that appear in v2.31.3.

6

Documentation/RelNotes/2.31.4.txt Normal file

View File

 @ -0,0 +1,6 @@
 Git v2.31.4 Release Notes
 =========================
 This release merges up the fixes that appear in v2.30.5 to address
 the security issue CVE-2022-29187; see the release notes for that
 version for details.

5

Documentation/RelNotes/2.31.5.txt Normal file

View File

 @ -0,0 +1,5 @@
 Git v2.31.5 Release Notes
 =========================
 This release merges the security fix that appears in v2.30.6; see
 the release notes for that version for details.

5

Documentation/RelNotes/2.31.6.txt Normal file

View File

 @ -0,0 +1,5 @@
 Git v2.31.6 Release Notes
 =========================
 This release merges the security fix that appears in v2.30.7; see
 the release notes for that version for details.

6

Documentation/RelNotes/2.31.7.txt Normal file

View File

 @ -0,0 +1,6 @@
 Git v2.31.7 Release Notes
 =========================
 This release merges up the fixes that appear in v2.30.8 to
 address the security issues CVE-2023-22490 and CVE-2023-23946;
 see the release notes for that version for details.

2

Documentation/blame-options.txt

View File

 @ -1,6 +1,6 @@
 -b::
 	Show blank SHA-1 for boundary commits.  This can also
 	be controlled via the `blame.blankboundary` config option.
 	be controlled via the `blame.blankBoundary` config option.
 --root::
 	Do not treat root commits as boundaries.  This can also be

4

Documentation/config.txt

View File

 @ -46,7 +46,7 @@ Subsection names are case sensitive and can contain any characters except
 newline and the null byte. Doublequote `"` and backslash can be included
 by escaping them as `\"` and `\\`, respectively. Backslashes preceding
 other characters are dropped when reading; for example, `\t` is read as
 `t` and `\0` is read as `0` Section headers cannot span multiple lines.
 `t` and `\0` is read as `0`. Section headers cannot span multiple lines.
 Variables may belong directly to a section or to a given subsection. You
 can have `[section]` if you have `[section "subsection"]`, but you don't
 need to.
 @ -398,6 +398,8 @@ include::config/interactive.txt[]
 include::config/log.txt[]
 include::config/lsrefs.txt[]
 include::config/mailinfo.txt[]
 include::config/mailmap.txt[]

2

Documentation/config/core.txt

View File

 @ -625,4 +625,6 @@ core.abbrev::
 	computed based on the approximate number of packed objects
 	in your repository, which hopefully is enough for
 	abbreviated object names to stay unique for some time.
 	If set to "no", no abbreviation is made and the object names
 	are shown in their full length.
 	The minimum length is 4.

2

Documentation/config/diff.txt

View File

 @ -85,6 +85,8 @@ diff.ignoreSubmodules::
 	and 'git status' when `status.submoduleSummary` is set unless it is
 	overridden by using the --ignore-submodules command-line option.
 	The 'git submodule' commands are not affected by this setting.
 	By default this is set to untracked so that any untracked
 	submodules are ignored.
 diff.mnemonicPrefix::
 	If set, 'git diff' uses a prefix pair that is different from the

2

Documentation/config/init.txt

View File

 @ -4,4 +4,4 @@ init.templateDir::
 init.defaultBranch::
 	Allows overriding the default branch name e.g. when initializing
 	a new repository or when cloning an empty repository.
 	a new repository.

9

Documentation/config/lsrefs.txt Normal file

View File

 @ -0,0 +1,9 @@
 lsrefs.unborn::
 	May be "advertise" (the default), "allow", or "ignore". If "advertise",
 	the server will respond to the client sending "unborn" (as described in
 	protocol-v2.txt) and will advertise support for this feature during the
 	protocol v2 capability advertisement. "allow" is the same as
 	"advertise" except that the server will not advertise support for this
 	feature; this is useful for load-balanced servers that cannot be
 	updated atomically (for example), since the administrator could
 	configure "allow", then after a delay, configure "advertise".

5

Documentation/config/maintenance.txt

View File

 @ -15,8 +15,9 @@ maintenance.strategy::
 * `none`: This default setting implies no task are run at any schedule.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly and the
   `loose-objects` and `incremental-repack` tasks daily.
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
   `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
   task weekly.
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task

15

Documentation/config/mergetool.txt

View File

 @ -13,6 +13,11 @@ mergetool.<tool>.cmd::
 	merged; 'MERGED' contains the name of the file to which the merge
 	tool should write the results of a successful merge.
 mergetool.<tool>.hideResolved::
 	Allows the user to override the global `mergetool.hideResolved` value
 	for a specific tool. See `mergetool.hideResolved` for the full
 	description.
 mergetool.<tool>.trustExitCode::
 	For a custom merge command, specify whether the exit code of
 	the merge command can be used to determine whether the merge was
 @ -40,6 +45,16 @@ mergetool.meld.useAutoMerge::
 	value of `false` avoids using `--auto-merge` altogether, and is the
 	default value.
 mergetool.hideResolved::
 	During a merge Git will automatically resolve as many conflicts as
 	possible and write the 'MERGED' file containing conflict markers around
 	any conflicts that it cannot resolve; 'LOCAL' and 'REMOTE' normally
 	represent the versions of the file from before Git's conflict
 	resolution. This flag causes 'LOCAL' and 'REMOTE' to be overwriten so
 	that only the unresolved conflicts are presented to the merge tool. Can
 	be configured per-tool via the `mergetool.<tool>.hideResolved`
 	configuration variable. Defaults to `false`.
 mergetool.keepBackup::
 	After performing a merge, the original file with conflict markers
 	can be saved as a file with a `.orig` extension.  If this variable

7

Documentation/config/pack.txt

View File

 @ -133,3 +133,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
 	for each new packfile that it writes in all places except for
 	linkgit:git-fast-import[1] and in the bulk checkin mechanism.
 	Defaults to false.

3

Documentation/config/rebase.txt

View File

 @ -68,3 +68,6 @@ rebase.rescheduleFailedExec::
 	Automatically reschedule `exec` commands that failed. This only makes
 	sense in interactive mode (or when an `--exec` option was provided).
 	This is the same as specifying the `--reschedule-failed-exec` option.
 rebase.forkPoint::
 	If set to false set `--no-fork-point` option by default.

11

Documentation/date-formats.txt

View File

 @ -1,10 +1,7 @@
 DATE FORMATS
 ------------
 The `GIT_AUTHOR_DATE`, `GIT_COMMITTER_DATE` environment variables
 ifdef::git-commit[]
 and the `--date` option
 endif::git-commit[]
 The `GIT_AUTHOR_DATE` and `GIT_COMMITTER_DATE` environment variables
 support the following date formats:
 Git internal format::
 @ -26,3 +23,9 @@ ISO 8601::
 +
 NOTE: In addition, the date part is accepted in the following formats:
 `YYYY.MM.DD`, `MM/DD/YYYY` and `DD.MM.YYYY`.
 ifdef::git-commit[]
 In addition to recognizing all date formats above, the `--date` option
 will also try to make sense of other, more human-centric date formats,
 such as relative dates like "yesterday" or "last Friday at noon".
 endif::git-commit[]

6

Documentation/diff-generate-patch.txt

View File

 @ -81,9 +81,9 @@ Combined diff format
 Any diff-generating command can take the `-c` or `--cc` option to
 produce a 'combined diff' when showing a merge. This is the default
 format when showing merges with linkgit:git-diff[1] or
 linkgit:git-show[1]. Note also that you can give the `-m` option to any
 of these commands to force generation of diffs with individual parents
 of a merge.
 linkgit:git-show[1]. Note also that you can give suitable
 `--diff-merges` option to any of these commands to force generation of
 diffs in specific format.
 A "combined diff" format looks like this:

59

Documentation/diff-options.txt

View File

 @ -33,6 +33,57 @@ endif::git-diff[]
 	show the patch by default, or to cancel the effect of `--patch`.
 endif::git-format-patch[]
 ifdef::git-log[]
 --diff-merges=(off|none|first-parent|1|separate|m|combined|c|dense-combined|cc)::
 --no-diff-merges::
 	Specify diff format to be used for merge commits. Default is
 	{diff-merges-default} unless `--first-parent` is in use, in which case
 	`first-parent` is the default.
 +
 --diff-merges=(off|none):::
 --no-diff-merges:::
 	Disable output of diffs for merge commits. Useful to override
 	implied value.
 +
 --diff-merges=first-parent:::
 --diff-merges=1:::
 	This option makes merge commits show the full diff with
 	respect to the first parent only.
 +
 --diff-merges=separate:::
 --diff-merges=m:::
 -m:::
 	This makes merge commits show the full diff with respect to
 	each of the parents. Separate log entry and diff is generated
 	for each parent. `-m` doesn't produce any output without `-p`.
 +
 --diff-merges=combined:::
 --diff-merges=c:::
 -c:::
 	With this option, diff output for a merge commit shows the
 	differences from each of the parents to the merge result
 	simultaneously instead of showing pairwise diff between a
 	parent and the result one at a time. Furthermore, it lists
 	only files which were modified from all parents. `-c` implies
 	`-p`.
 +
 --diff-merges=dense-combined:::
 --diff-merges=cc:::
 --cc:::
 	With this option the output produced by
 	`--diff-merges=combined` is further compressed by omitting
 	uninteresting hunks whose contents in the parents have only
 	two variants and the merge result picks one of them without
 	modification.  `--cc` implies `-p`.
 --combined-all-paths::
 	This flag causes combined diffs (used for merge commits) to
 	list the name of the file from all parents.  It thus only has
 	effect when `--diff-merges=[dense-]combined` is in use, and
 	is likely only useful if filename changes are detected (i.e.
 	when either rename or copy detection have been requested).
 endif::git-log[]
 -U<n>::
 --unified=<n>::
 	Generate diffs with <n> lines of context instead of
 @ -649,6 +700,14 @@ matches a pattern if removing any number of the final pathname
 components matches the pattern.  For example, the pattern "`foo*bar`"
 matches "`fooasdfbar`" and "`foo/bar/baz/asdf`" but not "`foobarx`".
 --skip-to=<file>::
 --rotate-to=<file>::
 	Discard the files before the named <file> from the output
 	(i.e. 'skip to'), or move them to the end of the output
 	(i.e. 'rotate to').  These were invented primarily for use
 	of the `git difftool` command, and may not be very useful
 	otherwise.
 ifndef::git-format-patch[]
 -R::
 	Swap two inputs; that is, show differences from index or

4

Documentation/fetch-options.txt

View File

 @ -7,6 +7,10 @@
 	existing contents of `.git/FETCH_HEAD`.  Without this
 	option old data in `.git/FETCH_HEAD` will be overwritten.
 --atomic::
 	Use an atomic transaction to update local refs. Either all refs are
 	updated, or on error, no refs are updated.
 --depth=<depth>::
 	Limit fetching to the specified number of commits from the tip of
 	each remote branch history. If fetching to a 'shallow' repository

2

Documentation/git-am.txt

View File

 @ -79,7 +79,7 @@ OPTIONS
 	Pass `-u` flag to 'git mailinfo' (see linkgit:git-mailinfo[1]).
 	The proposed commit log message taken from the e-mail
 	is re-coded into UTF-8 encoding (configuration variable
 	`i18n.commitencoding` can be used to specify project's
 	`i18n.commitEncoding` can be used to specify project's
 	preferred encoding if it is not UTF-8).
 +
 This was optional in prior versions of git, but now it is the

2

Documentation/git-blame.txt

View File

 @ -226,7 +226,7 @@ commit commentary), a blame viewer will not care.
 MAPPING AUTHORS
 ---------------
 include::mailmap.txt[]
 See linkgit:gitmailmap[5].
 SEE ALSO

6

Documentation/git-branch.txt

View File

 @ -78,8 +78,8 @@ renaming. If <newbranch> exists, -M must be used to force the rename
 to happen.
 The `-c` and `-C` options have the exact same semantics as `-m` and
 `-M`, except instead of the branch being renamed it along with its
 config and reflog will be copied to a new name.
 `-M`, except instead of the branch being renamed, it will be copied to a
 new name, along with its config and reflog.
 With a `-d` or `-D` option, `<branchname>` will be deleted.  You may
 specify more than one branch for deletion.  If the branch currently
 @ -153,7 +153,7 @@ OPTIONS
 --column[=<options>]::
 --no-column::
 	Display branch listing in columns. See configuration variable
 	column.branch for option syntax.`--column` and `--no-column`
 	`column.branch` for option syntax. `--column` and `--no-column`
 	without options are equivalent to 'always' and 'never' respectively.
 +
 This option is only applicable in non-verbose mode.

9

Documentation/git-check-mailmap.txt

View File

 @ -36,10 +36,17 @@ name is provided or known to the 'mailmap', ``Name $$<user@host>$$'' is
 printed; otherwise only ``$$<user@host>$$'' is printed.
 CONFIGURATION
 -------------
 See `mailmap.file` and `mailmap.blob` in linkgit:git-config[1] for how
 to specify a custom `.mailmap` target file or object.
 MAPPING AUTHORS
 ---------------
 include::mailmap.txt[]
 See linkgit:gitmailmap[5].
 GIT

16

Documentation/git-config.txt

View File

 @ -346,6 +346,22 @@ GIT_CONFIG_NOSYSTEM::
 See also <<FILES>>.
 GIT_CONFIG_COUNT::
 GIT_CONFIG_KEY_<n>::
 GIT_CONFIG_VALUE_<n>::
 	If GIT_CONFIG_COUNT is set to a positive number, all environment pairs
 	GIT_CONFIG_KEY_<n> and GIT_CONFIG_VALUE_<n> up to that number will be
 	added to the process's runtime configuration. The config pairs are
 	zero-indexed. Any missing key or value is treated as an error. An empty
 	GIT_CONFIG_COUNT is treated the same as GIT_CONFIG_COUNT=0, namely no
 	pairs are processed. These environment variables will override values
 	in configuration files, but will be overridden by any explicit options
 	passed via `git -c`.
 +
 This is useful for cases where you want to spawn multiple git commands
 with a common configuration but cannot depend on a configuration file,
 for example when writing scripts.
 [[EXAMPLES]]
 EXAMPLES

8

Documentation/git-difftool.txt

View File

 @ -34,6 +34,14 @@ OPTIONS
 	This is the default behaviour; the option is provided to
 	override any configuration settings.
 --rotate-to=<file>::
 	Start showing the diff for the given path,
 	the paths before it will move to end and output.
 --skip-to=<file>::
 	Start showing the diff for the given path, skipping all
 	the paths before it.
 -t <tool>::
 --tool=<tool>::
 	Use the diff tool specified by <tool>.  Valid values include

8

Documentation/git-for-each-ref.txt

View File

 @ -260,11 +260,9 @@ contents:lines=N::
 	The first `N` lines of the message.
 Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
 are obtained as `trailers` (or by using the historical alias
 `contents:trailers`).  Non-trailer lines from the trailer block can be omitted
 with `trailers:only`. Whitespace-continuations can be removed from trailers so
 that each trailer appears on a line by itself with its full content with
 `trailers:unfold`. Both can be used together as `trailers:unfold,only`.
 are obtained as `trailers[:options]` (or by using the historical alias
 `contents:trailers[:options]`). For valid [:option] values see `trailers`
 section of linkgit:git-log[1].
 For sorting purposes, fields with numeric values sort in numeric order
 (`objectsize`, `authordate`, `committerdate`, `creatordate`, `taggerdate`).

14

Documentation/git-gc.txt

View File

 @ -117,12 +117,14 @@ NOTES
 'git gc' tries very hard not to delete objects that are referenced
 anywhere in your repository. In particular, it will keep not only
 objects referenced by your current set of branches and tags, but also
 objects referenced by the index, remote-tracking branches, notes saved
 by 'git notes' under refs/notes/, reflogs (which may reference commits
 in branches that were later amended or rewound), and anything else in
 the refs/* namespace.  If you are expecting some objects to be deleted
 and they aren't, check all of those locations and decide whether it
 makes sense in your case to remove those references.
 objects referenced by the index, remote-tracking branches, reflogs
 (which may reference commits in branches that were later amended or
 rewound), and anything else in the refs/* namespace. Note that a note
 (of the kind created by 'git notes') attached to an object does not
 contribute in keeping the object alive. If you are expecting some
 objects to be deleted and they aren't, check all of those locations
 and decide whether it makes sense in your case to remove those
 references.
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using

10

Documentation/git-http-fetch.txt

View File

 @ -41,11 +41,17 @@ commit-id::
 		<commit-id>['\t'<filename-as-in--w>]
 --packfile=<hash>::
 	Instead of a commit id on the command line (which is not expected in
 	For internal use only. Instead of a commit id on the command
 	line (which is not expected in
 	this case), 'git http-fetch' fetches the packfile directly at the given
 	URL and uses index-pack to generate corresponding .idx and .keep files.
 	The hash is used to determine the name of the temporary file and is
 	arbitrary. The output of index-pack is printed to stdout.
 	arbitrary. The output of index-pack is printed to stdout. Requires
 	--index-pack-args.
 --index-pack-args=<args>::
 	For internal use only. The command to run on the contents of the
 	downloaded pack. Arguments are URL-encoded separated by spaces.
 --recover::
 	Verify that everything reachable from target is fetched.  Used after

25

Documentation/git-index-pack.txt

View File

 @ -9,17 +9,18 @@ git-index-pack - Build pack index file for an existing packed archive
 SYNOPSIS
 --------
 [verse]
 'git index-pack' [-v] [-o <index-file>] <pack-file>
 'git index-pack' [-v] [-o <index-file>] [--[no-]rev-index] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
                  [<pack-file>]
 		  [--[no-]rev-index] [<pack-file>]
 DESCRIPTION
 -----------
 Reads a packed archive (.pack) from the specified file, and
 builds a pack index file (.idx) for it.  The packed archive
 together with the pack index can then be placed in the
 objects/pack/ directory of a Git repository.
 builds a pack index file (.idx) for it. Optionally writes a
 reverse-index (.rev) for the specified pack. The packed
 archive together with the pack index can then be placed in
 the objects/pack/ directory of a Git repository.
 OPTIONS
 @ -35,6 +36,13 @@ OPTIONS
 	fails if the name of packed archive does not end
 	with .pack).
 --[no-]rev-index::
 	When this flag is provided, generate a reverse index
 	(a `.rev` file) corresponding to the given pack. If
 	`--verify` is given, ensure that the existing
 	reverse index is correct. Takes precedence over
 	`pack.writeReverseIndex`.
 --stdin::
 	When this flag is provided, the pack is read from stdin
 	instead and a copy is then written to <pack-file>. If
 @ -78,7 +86,12 @@ OPTIONS
 	Die if the pack contains broken links. For internal use only.
 --fsck-objects::
 	Die if the pack contains broken objects. For internal use only.
 	For internal use only.
 +
 Die if the pack contains broken objects. If the pack contains a tree
 pointing to a .gitmodules blob that does not exist, prints the hash of
 that blob (for the caller to check) after the hash that goes into the
 name of the pack/idx file (see "Notes").
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving

46

Documentation/git-log.txt

View File

 @ -107,47 +107,15 @@ DIFF FORMATTING
 By default, `git log` does not generate any diff output. The options
 below can be used to show the changes made by each commit.
 Note that unless one of `-c`, `--cc`, or `-m` is given, merge commits
 will never show a diff, even if a diff format like `--patch` is
 selected, nor will they match search options like `-S`. The exception is
 when `--first-parent` is in use, in which merges are treated like normal
 single-parent commits (this can be overridden by providing a
 combined-diff option or with `--no-diff-merges`).
 -c::
 	With this option, diff output for a merge commit
 	shows the differences from each of the parents to the merge result
 	simultaneously instead of showing pairwise diff between a parent
 	and the result one at a time. Furthermore, it lists only files
 	which were modified from all parents.
 --cc::
 	This flag implies the `-c` option and further compresses the
 	patch output by omitting uninteresting hunks whose contents in
 	the parents have only two variants and the merge result picks
 	one of them without modification.
 --combined-all-paths::
 	This flag causes combined diffs (used for merge commits) to
 	list the name of the file from all parents.  It thus only has
 	effect when -c or --cc are specified, and is likely only
 	useful if filename changes are detected (i.e. when either
 	rename or copy detection have been requested).
 -m::
 	This flag makes the merge commits show the full diff like
 	regular commits; for each merge parent, a separate log entry
 	and diff is generated. An exception is that only diff against
 	the first parent is shown when `--first-parent` option is given;
 	in that case, the output represents the changes the merge
 	brought _into_ the then-current branch.
 --diff-merges=off::
 --no-diff-merges::
 	Disable output of diffs for merge commits (default). Useful to
 	override `-m`, `-c`, or `--cc`.
 Note that unless one of `--diff-merges` variants (including short
 `-m`, `-c`, and `--cc` options) is explicitly given, merge commits
 will not show a diff, even if a diff format like `--patch` is
 selected, nor will they match search options like `-S`. The exception
 is when `--first-parent` is in use, in which case `first-parent` is
 the default format.
 :git-log: 1
 :diff-merges-default: `off`
 include::diff-options.txt[]
 include::diff-generate-patch.txt[]

8

Documentation/git-ls-files.txt

View File

 @ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
 		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
 @ -80,6 +81,13 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 --deduplicate::
 	When only filenames are shown, suppress duplicates that may
 	come from having multiple stages during a merge, or giving
 	`--deleted` and `--modified` option at the same time.
 	When any of the `-t`, `--unmerged`, or `--stage` option is
 	in use, this option has no effect.
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.

4

Documentation/git-mailinfo.txt

View File

 @ -53,7 +53,7 @@ character.
 	The commit log message, author name and author email are
 	taken from the e-mail, and after minimally decoding MIME
 	transfer encoding, re-coded in the charset specified by
 	i18n.commitencoding (defaulting to UTF-8) by transliterating
 	`i18n.commitEncoding` (defaulting to UTF-8) by transliterating
 	them.  This used to be optional but now it is the default.
 +
 Note that the patch is always used as-is without charset
 @ -61,7 +61,7 @@ conversion, even with this flag.
 --encoding=<encoding>::
 	Similar to -u.  But when re-coding, the charset specified here is
 	used instead of the one specified by i18n.commitencoding or UTF-8.
 	used instead of the one specified by `i18n.commitEncoding` or UTF-8.
 -n::
 	Disable all charset re-coding of the metadata.

122

Documentation/git-maintenance.txt

View File

 @ -145,6 +145,12 @@ incremental-repack::
 	which is a special case that attempts to repack all pack-files
 	into a single pack-file.
 pack-refs::
 	The `pack-refs` task collects the loose reference files and
 	collects them into a single file. This speeds up operations that
 	need to iterate across many references. See linkgit:git-pack-refs[1]
 	for more information.
 OPTIONS
 -------
 --auto::
 @ -218,6 +224,122 @@ Further, the `git gc` command should not be combined with
 but does not take the lock in the same way as `git maintenance run`. If
 possible, use `git maintenance run --task=gc` instead of `git gc`.
 The following sections describe the mechanisms put in place to run
 background maintenance by `git maintenance start` and how to customize
 them.
 BACKGROUND MAINTENANCE ON POSIX SYSTEMS
 ---------------------------------------
 The standard mechanism for scheduling background tasks on POSIX systems
 is cron(8). This tool executes commands based on a given schedule. The
 current list of user-scheduled tasks can be found by running `crontab -l`.
 The schedule written by `git maintenance start` is similar to this:
 -----------------------------------------------------------------------
 # BEGIN GIT MAINTENANCE SCHEDULE
 # The following schedule was created by Git
 # Any edits made in this region might be
 # replaced in the future by a Git command.
 1-23 * * * "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=hourly
 0 * * 1-6 "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=daily
 0 * * 0 "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=weekly
 # END GIT MAINTENANCE SCHEDULE
 -----------------------------------------------------------------------
 The comments are used as a region to mark the schedule as written by Git.
 Any modifications within this region will be completely deleted by
 `git maintenance stop` or overwritten by `git maintenance start`.
 The `crontab` entry specifies the full path of the `git` executable to
 ensure that the executed `git` command is the same one with which
 `git maintenance start` was issued independent of `PATH`. If the same user
 runs `git maintenance start` with multiple Git executables, then only the
 latest executable is used.
 These commands use `git for-each-repo --config=maintenance.repo` to run
 `git maintenance run --schedule=<frequency>` on each repository listed in
 the multi-valued `maintenance.repo` config option. These are typically
 loaded from the user-specific global config. The `git maintenance` process
 then determines which maintenance tasks are configured to run on each
 repository with each `<frequency>` using the `maintenance.<task>.schedule`
 config options. These values are loaded from the global or repository
 config values.
 If the config values are insufficient to achieve your desired background
 maintenance schedule, then you can create your own schedule. If you run
 `crontab -e`, then an editor will load with your user-specific `cron`
 schedule. In that editor, you can add your own schedule lines. You could
 start by adapting the default schedule listed earlier, or you could read
 the crontab(5) documentation for advanced scheduling techniques. Please
 do use the full path and `--exec-path` techniques from the default
 schedule to ensure you are executing the correct binaries in your
 schedule.
 BACKGROUND MAINTENANCE ON MACOS SYSTEMS
 ---------------------------------------
 While macOS technically supports `cron`, using `crontab -e` requires
 elevated privileges and the executed process does not have a full user
 context. Without a full user context, Git and its credential helpers
 cannot access stored credentials, so some maintenance tasks are not
 functional.
 Instead, `git maintenance start` interacts with the `launchctl` tool,
 which is the recommended way to schedule timed jobs in macOS. Scheduling
 maintenance through `git maintenance (start|stop)` requires some
 `launchctl` features available only in macOS 10.11 or later.
 Your user-specific scheduled tasks are stored as XML-formatted `.plist`
 files in `~/Library/LaunchAgents/`. You can see the currently-registered
 tasks using the following command:
 -----------------------------------------------------------------------
 $ ls ~/Library/LaunchAgents/org.git-scm.git*
 org.git-scm.git.daily.plist
 org.git-scm.git.hourly.plist
 org.git-scm.git.weekly.plist
 -----------------------------------------------------------------------
 One task is registered for each `--schedule=<frequency>` option. To
 inspect how the XML format describes each schedule, open one of these
 `.plist` files in an editor and inspect the `<array>` element following
 the `<key>StartCalendarInterval</key>` element.
 `git maintenance start` will overwrite these files and register the
 tasks again with `launchctl`, so any customizations should be done by
 creating your own `.plist` files with distinct names. Similarly, the
 `git maintenance stop` command will unregister the tasks with `launchctl`
 and delete the `.plist` files.
 To create more advanced customizations to your background tasks, see
 launchctl.plist(5) for more information.
 BACKGROUND MAINTENANCE ON WINDOWS SYSTEMS
 -----------------------------------------
 Windows does not support `cron` and instead has its own system for
 scheduling background tasks. The `git maintenance start` command uses
 the `schtasks` command to submit tasks to this system. You can inspect
 all background tasks using the Task Scheduler application. The tasks
 added by Git have names of the form `Git Maintenance (<frequency>)`.
 The Task Scheduler GUI has ways to inspect these tasks, but you can also
 export the tasks to XML files and view the details there.
 Note that since Git is a console application, these background tasks
 create a console window visible to the current user. This can be changed
 manually by selecting the "Run whether user is logged in or not" option
 in Task Scheduler. This change requires a password input, which is why
 `git maintenance start` does not select it by default.
 If you want to customize the background tasks, please rename the tasks
 so future calls to `git maintenance (start|stop)` do not overwrite your
 custom tasks.
 GIT
 ---

4

Documentation/git-mergetool--lib.txt

View File

 @ -38,6 +38,10 @@ get_merge_tool_cmd::
 get_merge_tool_path::
 	returns the custom path for a merge tool.
 initialize_merge_tool::
 	bring merge tool specific functions into scope so they can be used or
 	overridden.
 run_merge_tool::
 	launches a merge tool given the tool name and a true/false
 	flag to indicate whether a merge base is present.

4

Documentation/git-mergetool.txt

View File

 @ -99,6 +99,10 @@ success of the resolution after the custom tool has exited.
 	(see linkgit:git-config[1]).  To cancel `diff.orderFile`,
 	use `-O/dev/null`.
 CONFIGURATION
 -------------
 include::config/mergetool.txt[]
 TEMPORARY FILES
 ---------------
 `git mergetool` creates `*.orig` backup files while resolving merges.

39

Documentation/git-mktag.txt

View File

 @ -3,7 +3,7 @@ git-mktag(1)
 NAME
 ----
 git-mktag - Creates a tag object
 git-mktag - Creates a tag object with extra validation
 SYNOPSIS
 @ -11,25 +11,52 @@ SYNOPSIS
 [verse]
 'git mktag'
 OPTIONS
 -------
 --strict::
 	By default mktag turns on the equivalent of
 	linkgit:git-fsck[1] `--strict` mode. Use `--no-strict` to
 	disable it.
 DESCRIPTION
 -----------
 Reads a tag contents on standard input and creates a tag object
 that can also be used to sign other objects.
 The output is the new tag's <object> identifier.
 Reads a tag contents on standard input and creates a tag object. The
 output is the new tag's <object> identifier.
 This command is mostly equivalent to linkgit:git-hash-object[1]
 invoked with `-t tag -w --stdin`. I.e. both of these will create and
 write a tag found in `my-tag`:
     git mktag <my-tag
     git hash-object -t tag -w --stdin <my-tag
 The difference is that mktag will die before writing the tag if the
 tag doesn't pass a linkgit:git-fsck[1] check.
 The "fsck" check done mktag is stricter than what linkgit:git-fsck[1]
 would run by default in that all `fsck.<msg-id>` messages are promoted
 from warnings to errors (so e.g. a missing "tagger" line is an error).
 Extra headers in the object are also an error under mktag, but ignored
 by linkgit:git-fsck[1]. This extra check can be turned off by setting
 the appropriate `fsck.<msg-id>` varible:
     git -c fsck.extraHeaderEntry=ignore mktag <my-tag-with-headers
 Tag Format
 ----------
 A tag signature file, to be fed to this command's standard input,
 has a very simple fixed format: four lines of
   object <sha1>
   object <hash>
   type <typename>
   tag <tagname>
   tagger <tagger>
 followed by some 'optional' free-form message (some tags created
 by older Git may not have `tagger` line).  The message, when
 by older Git may not have `tagger` line).  The message, when it
 exists, is separated by a blank line from the header.  The
 message part may contain a signature that Git itself doesn't
 care about, but that can be verified with gpg.

11

Documentation/git-pack-objects.txt

View File

 @ -400,6 +400,17 @@ Note that we pick a single island for each regex to go into, using "last
 one wins" ordering (which allows repo-specific config to take precedence
 over user-wide config, and so forth).
 CONFIGURATION
 -------------
 Various configuration variables affect packing, see
 linkgit:git-config[1] (search for "pack" and "delta").
 Notably, delta compression is not used on objects larger than the
 `core.bigFileThreshold` configuration variable and on files with the
 attribute `delta` set to false.
 SEE ALSO
 --------
 linkgit:git-rev-list[1]

20

Documentation/git-range-diff.txt

View File

 @ -10,6 +10,7 @@ SYNOPSIS
 [verse]
 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
 	[--no-dual-color] [--creation-factor=<factor>]
 	[--left-only | --right-only]
 	( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
 DESCRIPTION
 @ -28,6 +29,17 @@ Finally, the list of matching commits is shown in the order of the
 second commit range, with unmatched commits being inserted just after
 all of their ancestors have been shown.
 There are three ways to specify the commit ranges:
 - `<range1> <range2>`: Either commit range can be of the form
   `<base>..<rev>`, `<rev>^!` or `<rev>^-<n>`. See `SPECIFYING RANGES`
   in linkgit:gitrevisions[7] for more details.
 - `<rev1>...<rev2>`. This is equivalent to
   `<rev2>..<rev1> <rev1>..<rev2>`.
 - `<base> <rev1> <rev2>`: This is equivalent to `<base>..<rev1>
   <base>..<rev2>`.
 OPTIONS
 -------
 @ -57,6 +69,14 @@ to revert to color all lines according to the outer diff markers
 	See the ``Algorithm`` section below for an explanation why this is
 	needed.
 --left-only::
 	Suppress commits that are missing from the first specified range
 	(or the "left range" when using the `<rev1>...<rev2>` format).
 --right-only::
 	Suppress commits that are missing from the second specified range
 	(or the "right range" when using the `<rev1>...<rev2>` format).
 --[no-]notes[=<ref>]::
 	This flag is passed to the `git log` program
 	(see linkgit:git-log[1]) that generates the patches.

9

Documentation/git-repack.txt

View File

 @ -165,9 +165,12 @@ depth is 4095.
 	Pass the `--delta-islands` option to `git-pack-objects`, see
 	linkgit:git-pack-objects[1].
 Configuration
 CONFIGURATION
 -------------
 Various configuration variables affect packing, see
 linkgit:git-config[1] (search for "pack" and "delta").
 By default, the command passes `--delta-base-offset` option to
 'git pack-objects'; this typically results in slightly smaller packs,
 but the generated packs are incompatible with versions of Git older than
 @ -178,6 +181,10 @@ need to set the configuration variable `repack.UseDeltaBaseOffset` to
 is unaffected by this option as the conversion is performed on the fly
 as needed in that case.
 Delta compression is not used on objects larger than the
 `core.bigFileThreshold` configuration variable and on files with the
 attribute `delta` set to false.
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]

93

Documentation/git-rev-list.txt

View File

 @ -31,6 +31,99 @@ include::rev-list-options.txt[]
 include::pretty-formats.txt[]
 EXAMPLES
 --------
 * Print the list of commits reachable from the current branch.
 +
 ----------
 git rev-list HEAD
 ----------
 * Print the list of commits on this branch, but not present in the
   upstream branch.
 +
 ----------
 git rev-list @{upstream}..HEAD
 ----------
 * Format commits with their author and commit message (see also the
   porcelain linkgit:git-log[1]).
 +
 ----------
 git rev-list --format=medium HEAD
 ----------
 * Format commits along with their diffs (see also the porcelain
   linkgit:git-log[1], which can do this in a single process).
 +
 ----------
 git rev-list HEAD |
 git diff-tree --stdin --format=medium -p
 ----------
 * Print the list of commits on the current branch that touched any
   file in the `Documentation` directory.
 +
 ----------
 git rev-list HEAD -- Documentation/
 ----------
 * Print the list of commits authored by you in the past year, on
   any branch, tag, or other ref.
 +
 ----------
 git rev-list --author=you@example.com --since=1.year.ago --all
 ----------
 * Print the list of objects reachable from the current branch (i.e., all
   commits and the blobs and trees they contain).
 +
 ----------
 git rev-list --objects HEAD
 ----------
 * Compare the disk size of all reachable objects, versus those
   reachable from reflogs, versus the total packed size. This can tell
   you whether running `git repack -ad` might reduce the repository size
   (by dropping unreachable objects), and whether expiring reflogs might
   help.
 +
 ----------
 # reachable objects
 git rev-list --disk-usage --objects --all
 # plus reflogs
 git rev-list --disk-usage --objects --all --reflog
 # total disk size used
 du -c .git/objects/pack/*.pack .git/objects/??/*
 # alternative to du: add up "size" and "size-pack" fields
 git count-objects -v
 ----------
 * Report the disk size of each branch, not including objects used by the
   current branch. This can find outliers that are contributing to a
   bloated repository size (e.g., because somebody accidentally committed
   large build artifacts).
 +
 ----------
 git for-each-ref --format='%(refname)' |
 while read branch
 do
 	size=$(git rev-list --disk-usage --objects HEAD..$branch)
 	echo "$size $branch"
 done |
 sort -n
 ----------
 * Compare the on-disk size of branches in one group of refs, excluding
   another. If you co-mingle objects from multiple remotes in a single
   repository, this can show which remotes are contributing to the
   repository size (taking the size of `origin` as a baseline).
 +
 ----------
 git rev-list --disk-usage --objects --remotes=$suspect --not --remotes=origin
 ----------
 GIT
 ---
 Part of the linkgit:git[1] suite

74

Documentation/git-rev-parse.txt

View File

 @ -212,6 +212,18 @@ Options for Files
 	Only the names of the variables are listed, not their value,
 	even if they are set.
 --path-format=(absolute|relative)::
 	Controls the behavior of certain other options. If specified as absolute, the
 	paths printed by those options will be absolute and canonical. If specified as
 	relative, the paths will be relative to the current working directory if that
 	is possible.  The default is option specific.
 +
 This option may be specified multiple times and affects only the arguments that
 follow it on the command line, either to the end of the command line or the next
 instance of this option.
 The following options are modified by `--path-format`:
 --git-dir::
 	Show `$GIT_DIR` if defined. Otherwise show the path to
 	the .git directory. The path shown, when relative, is
 @ -221,13 +233,42 @@ If `$GIT_DIR` is not defined and the current directory
 is not detected to lie in a Git repository or work tree
 print a message to stderr and exit with nonzero status.
 --git-common-dir::
 	Show `$GIT_COMMON_DIR` if defined, else `$GIT_DIR`.
 --resolve-git-dir <path>::
 	Check if <path> is a valid repository or a gitfile that
 	points at a valid repository, and print the location of the
 	repository.  If <path> is a gitfile then the resolved path
 	to the real repository is printed.
 --git-path <path>::
 	Resolve "$GIT_DIR/<path>" and takes other path relocation
 	variables such as $GIT_OBJECT_DIRECTORY,
 	$GIT_INDEX_FILE... into account. For example, if
 	$GIT_OBJECT_DIRECTORY is set to /foo/bar then "git rev-parse
 	--git-path objects/abc" returns /foo/bar/abc.
 --show-toplevel::
 	Show the (by default, absolute) path of the top-level directory
 	of the working tree. If there is no working tree, report an error.
 --show-superproject-working-tree::
 	Show the absolute path of the root of the superproject's
 	working tree (if exists) that uses the current repository as
 	its submodule.  Outputs nothing if the current repository is
 	not used as a submodule by any project.
 --shared-index-path::
 	Show the path to the shared index file in split index mode, or
 	empty if not in split-index mode.
 The following options are unaffected by `--path-format`:
 --absolute-git-dir::
 	Like `--git-dir`, but its output is always the canonicalized
 	absolute path.
 --git-common-dir::
 	Show `$GIT_COMMON_DIR` if defined, else `$GIT_DIR`.
 --is-inside-git-dir::
 	When the current working directory is below the repository
 	directory print "true", otherwise "false".
 @ -242,19 +283,6 @@ print a message to stderr and exit with nonzero status.
 --is-shallow-repository::
 	When the repository is shallow print "true", otherwise "false".
 --resolve-git-dir <path>::
 	Check if <path> is a valid repository or a gitfile that
 	points at a valid repository, and print the location of the
 	repository.  If <path> is a gitfile then the resolved path
 	to the real repository is printed.
 --git-path <path>::
 	Resolve "$GIT_DIR/<path>" and takes other path relocation
 	variables such as $GIT_OBJECT_DIRECTORY,
 	$GIT_INDEX_FILE... into account. For example, if
 	$GIT_OBJECT_DIRECTORY is set to /foo/bar then "git rev-parse
 	--git-path objects/abc" returns /foo/bar/abc.
 --show-cdup::
 	When the command is invoked from a subdirectory, show the
 	path of the top-level directory relative to the current
 @ -265,20 +293,6 @@ print a message to stderr and exit with nonzero status.
 	path of the current directory relative to the top-level
 	directory.
 --show-toplevel::
 	Show the absolute path of the top-level directory of the working
 	tree. If there is no working tree, report an error.
 --show-superproject-working-tree::
 	Show the absolute path of the root of the superproject's
 	working tree (if exists) that uses the current repository as
 	its submodule.  Outputs nothing if the current repository is
 	not used as a submodule by any project.
 --shared-index-path::
 	Show the path to the shared index file in split index mode, or
 	empty if not in split-index mode.
 --show-object-format[=(storage|input|output)]::
 	Show the object format (hash algorithm) used for the repository
 	for storage inside the `.git` directory, input, or output. For

8

Documentation/git-shortlog.txt

View File

 @ -111,11 +111,11 @@ include::rev-list-options.txt[]
 MAPPING AUTHORS
 ---------------
 The `.mailmap` feature is used to coalesce together commits by the same
 person in the shortlog, where their name and/or email address was
 spelled differently.
 See linkgit:gitmailmap[5].
 include::mailmap.txt[]
 Note that if `git shortlog` is run outside of a repository (to process
 log contents on standard input), it will look for a `.mailmap` file in
 the current directory.
 GIT
 ---

7

Documentation/git-show.txt

View File

 @ -45,10 +45,13 @@ include::pretty-options.txt[]
 include::pretty-formats.txt[]
 COMMON DIFF OPTIONS
 -------------------
 DIFF FORMATTING
 ---------------
 The options below can be used to change the way `git show` generates
 diff output.
 :git-log: 1
 :diff-merges-default: `dense-combined`
 include::diff-options.txt[]
 include::diff-generate-patch.txt[]

8

Documentation/git-stash.txt

View File

 @ -8,8 +8,8 @@ git-stash - Stash the changes in a dirty working directory away
 SYNOPSIS
 --------
 [verse]
 'git stash' list [<options>]
 'git stash' show [<options>] [<stash>]
 'git stash' list [<log-options>]
 'git stash' show [<diff-options>] [<stash>]
 'git stash' drop [-q|--quiet] [<stash>]
 'git stash' ( pop | apply ) [--index] [-q|--quiet] [<stash>]
 'git stash' branch <branchname> [<stash>]
 @ -67,7 +67,7 @@ save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q
 	Instead, all non-option arguments are concatenated to form the stash
 	message.
 list [<options>]::
 list [<log-options>]::
 	List the stash entries that you currently have.  Each 'stash entry' is
 	listed with its name (e.g. `stash@{0}` is the latest entry, `stash@{1}` is
 @ -83,7 +83,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
 The command takes options applicable to the 'git log'
 command to control what is shown and how. See linkgit:git-log[1].
 show [<options>] [<stash>]::
 show [<diff-options>] [<stash>]::
 	Show the changes recorded in the stash entry as a diff between the
 	stashed contents and the commit back when the stash entry was first

2

Documentation/git-status.txt

View File

 @ -130,7 +130,7 @@ ignored, then the directory is not shown, but all contents are shown.
 --column[=<options>]::
 --no-column::
 	Display untracked files in columns. See configuration variable
 	column.status for option syntax.`--column` and `--no-column`
 	`column.status` for option syntax. `--column` and `--no-column`
 	without options are equivalent to 'always' and 'never'
 	respectively.

2

Documentation/git-tag.txt

View File

 @ -134,7 +134,7 @@ options for details.
 --column[=<options>]::
 --no-column::
 	Display tag listing in columns. See configuration variable
 	column.tag for option syntax.`--column` and `--no-column`
 	`column.tag` for option syntax. `--column` and `--no-column`
 	without options are equivalent to 'always' and 'never' respectively.
 +
 This option is only applicable when listing tags without annotation lines.

79

Documentation/git-worktree.txt

View File

 @ -97,8 +97,9 @@ list::
 List details of each working tree.  The main working tree is listed first,
 followed by each of the linked working trees.  The output details include
 whether the working tree is bare, the revision currently checked out, the
 branch currently checked out (or "detached HEAD" if none), and "locked" if
 the worktree is locked.
 branch currently checked out (or "detached HEAD" if none), "locked" if
 the worktree is locked, "prunable" if the worktree can be pruned by `prune`
 command.
 lock::
 @ -143,6 +144,11 @@ locate it. Running `repair` within the recently-moved working tree will
 reestablish the connection. If multiple linked working trees are moved,
 running `repair` from any working tree with each tree's new `<path>` as
 an argument, will reestablish the connection to all the specified paths.
 +
 If both the main working tree and linked working trees have been moved
 manually, then running `repair` in the main working tree and specifying the
 new `<path>` of each linked working tree will reestablish all connections
 in both directions.
 unlock::
 @ -226,9 +232,14 @@ This can also be set up as the default behaviour by using the
 -v::
 --verbose::
 	With `prune`, report all removals.
 +
 With `list`, output additional information about worktrees (see below).
 --expire <time>::
 	With `prune`, only expire unused working trees older than `<time>`.
 +
 With `list`, annotate missing working trees as prunable if they are
 older than `<time>`.
 --reason <string>::
 	With `lock`, an explanation why the working tree is locked.
 @ -367,13 +378,46 @@ $ git worktree list
 /path/to/other-linked-worktree  1234abc  (detached HEAD)
 ------------
 The command also shows annotations for each working tree, according to its state.
 These annotations are:
  * `locked`, if the working tree is locked.
  * `prunable`, if the working tree can be pruned via `git worktree prune`.
 ------------
 $ git worktree list
 /path/to/linked-worktree    abcd1234 [master]
 /path/to/locked-worktreee   acbd5678 (brancha) locked
 /path/to/prunable-worktree  5678abc  (detached HEAD) prunable
 ------------
 For these annotations, a reason might also be available and this can be
 seen using the verbose mode. The annotation is then moved to the next line
 indented followed by the additional information.
 ------------
 $ git worktree list --verbose
 /path/to/linked-worktree              abcd1234 [master]
 /path/to/locked-worktree-no-reason    abcd5678 (detached HEAD) locked
 /path/to/locked-worktree-with-reason  1234abcd (brancha)
 	locked: working tree path is mounted on a portable device
 /path/to/prunable-worktree            5678abc1 (detached HEAD)
 	prunable: gitdir file points to non-existent location
 ------------
 Note that the annotation is moved to the next line if the additional
 information is available, otherwise it stays on the same line as the
 working tree itself.
 Porcelain Format
 ~~~~~~~~~~~~~~~~
 The porcelain format has a line per attribute.  Attributes are listed with a
 label and value separated by a single space.  Boolean attributes (like `bare`
 and `detached`) are listed as a label only, and are present only
 if the value is true.  The first attribute of a working tree is always
 `worktree`, an empty line indicates the end of the record.  For example:
 if the value is true.  Some attributes (like `locked`) can be listed as a label
 only or with a value depending upon whether a reason is available.  The first
 attribute of a working tree is always `worktree`, an empty line indicates the
 end of the record.  For example:
 ------------
 $ git worktree list --porcelain
 @ -388,6 +432,33 @@ worktree /path/to/other-linked-worktree
 HEAD 1234abc1234abc1234abc1234abc1234abc1234a
 detached
 worktree /path/to/linked-worktree-locked-no-reason
 HEAD 5678abc5678abc5678abc5678abc5678abc5678c
 branch refs/heads/locked-no-reason
 locked
 worktree /path/to/linked-worktree-locked-with-reason
 HEAD 3456def3456def3456def3456def3456def3456b
 branch refs/heads/locked-with-reason
 locked reason why is locked
 worktree /path/to/linked-worktree-prunable
 HEAD 1233def1234def1234def1234def1234def1234b
 detached
 prunable gitdir file points to non-existent location
 ------------
 If the lock reason contains "unusual" characters such as newline, they
 are escaped and the entire reason is quoted as explained for the
 configuration variable `core.quotePath` (see linkgit:git-config[1]).
 For Example:
 ------------
 $ git worktree list --porcelain
 ...
 locked "reason\nwhy is locked"
 ...
 ------------
 EXAMPLES

24

Documentation/git.txt

View File

 @ -13,7 +13,7 @@ SYNOPSIS
     [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
     [-p|--paginate|-P|--no-pager] [--no-replace-objects] [--bare]
     [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
     [--super-prefix=<path>]
     [--super-prefix=<path>] [--config-env <name>=<envvar>]
     <command> [<args>]
 DESCRIPTION
 @ -80,6 +80,28 @@ config file). Including the equals but with an empty value (like `git -c
 foo.bar= ...`) sets `foo.bar` to the empty string which `git config
 --type=bool` will convert to `false`.
 --config-env=<name>=<envvar>::
 	Like `-c <name>=<value>`, give configuration variable
 	'<name>' a value, where <envvar> is the name of an
 	environment variable from which to retrieve the value. Unlike
 	`-c` there is no shortcut for directly setting the value to an
 	empty string, instead the environment variable itself must be
 	set to the empty string.  It is an error if the `<envvar>` does not exist
 	in the environment. `<envvar>` may not contain an equals sign
 	to avoid ambiguity with `<name>` containing one.
 +
 This is useful for cases where you want to pass transitory
 configuration options to git, but are doing so on OS's where
 other processes might be able to read your cmdline
 (e.g. `/proc/self/cmdline`), but not your environ
 (e.g. `/proc/self/environ`). That behavior is the default on
 Linux, but may not be on your system.
 +
 Note that this might add security for variables such as
 `http.extraHeader` where the sensitive information is part of
 the value, but not e.g. `url.<base>.insteadOf` where the
 sensitive information can be part of the key.
 --exec-path[=<path>]::
 	Path to wherever your core Git programs are installed.
 	This can also be controlled by setting the GIT_EXEC_PATH

41

Documentation/gitdiffcore.txt

View File

 @ -74,6 +74,7 @@ into another list.  There are currently 5 such transformations:
 - diffcore-merge-broken
 - diffcore-pickaxe
 - diffcore-order
 - diffcore-rotate
 These are applied in sequence.  The set of filepairs 'git diff-{asterisk}'
 commands find are used as the input to diffcore-break, and
 @ -168,6 +169,26 @@ a similarity score different from the default of 50% by giving a
 number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
 /10 = 80%).
 Note that when rename detection is on but both copy and break
 detection are off, rename detection adds a preliminary step that first
 checks if files are moved across directories while keeping their
 filename the same.  If there is a file added to a directory whose
 contents is sufficiently similar to a file with the same name that got
 deleted from a different directory, it will mark them as renames and
 exclude them from the later quadratic step (the one that pairwise
 compares all unmatched files to find the "best" matches, determined by
 the highest content similarity).  So, for example, if a deleted
 docs/ext.txt and an added docs/config/ext.txt are similar enough, they
 will be marked as a rename and prevent an added docs/ext.md that may
 be even more similar to the deleted docs/ext.txt from being considered
 as the rename destination in the later step.  For this reason, the
 preliminary "match same filename" step uses a bit higher threshold to
 mark a file pair as a rename and stop considering other candidates for
 better matches.  At most, one comparison is done per file in this
 preliminary pass; so if there are several remaining ext.txt files
 throughout the directory hierarchy after exact rename detection, this
 preliminary step will be skipped for those files.
 Note.  When the "-C" option is used with `--find-copies-harder`
 option, 'git diff-{asterisk}' commands feed unmodified filepairs to
 diffcore mechanism as well as modified ones.  This lets the copy
 @ -276,6 +297,26 @@ Documentation
 t
 ------------------------------------------------
 diffcore-rotate: For Changing At Which Path Output Starts
 ---------------------------------------------------------
 This transformation takes one pathname, and rotates the set of
 filepairs so that the filepair for the given pathname comes first,
 optionally discarding the paths that come before it.  This is used
 to implement the `--skip-to` and the `--rotate-to` options.  It is
 an error when the specified pathname is not in the set of filepairs,
 but it is not useful to error out when used with "git log" family of
 commands, because it is unreasonable to expect that a given path
 would be modified by each and every commit shown by the "git log"
 command.  For this reason, when used with "git log", the filepair
 that sorts the same as, or the first one that sorts after, the given
 pathname is where the output starts.
 Use of this transformation combined with diffcore-order will produce
 unexpected results, as the input to this transformation is likely
 not sorted when diffcore-order is in effect.
 SEE ALSO
 --------
 linkgit:git-diff[1],

123

Documentation/gitmailmap.txt Normal file

View File

 @ -0,0 +1,123 @@
 gitmailmap(5)
 =============
 NAME
 ----
 gitmailmap - Map author/committer names and/or E-Mail addresses
 SYNOPSIS
 --------
 $GIT_WORK_TREE/.mailmap
 DESCRIPTION
 -----------
 If the file `.mailmap` exists at the toplevel of the repository, or at
 the location pointed to by the `mailmap.file` or `mailmap.blob`
 configuration options (see linkgit:git-config[1]), it
 is used to map author and committer names and email addresses to
 canonical real names and email addresses.
 SYNTAX
 ------
 The '#' character begins a comment to the end of line, blank lines
 are ignored.
 In the simple form, each line in the file consists of the canonical
 real name of an author, whitespace, and an email address used in the
 commit (enclosed by '<' and '>') to map to the name. For example:
 --
 	Proper Name <commit@email.xx>
 --
 The more complex forms are:
 --
 	<proper@email.xx> <commit@email.xx>
 --
 which allows mailmap to replace only the email part of a commit, and:
 --
 	Proper Name <proper@email.xx> <commit@email.xx>
 --
 which allows mailmap to replace both the name and the email of a
 commit matching the specified commit email address, and:
 --
 	Proper Name <proper@email.xx> Commit Name <commit@email.xx>
 --
 which allows mailmap to replace both the name and the email of a
 commit matching both the specified commit name and email address.
 Both E-Mails and names are matched case-insensitively. For example
 this would also match the 'Commit Name <commit&#64;email.xx>' above:
 --
 	Proper Name <proper@email.xx> CoMmIt NaMe <CoMmIt@EmAiL.xX>
 --
 EXAMPLES
 --------
 Your history contains commits by two authors, Jane
 and Joe, whose names appear in the repository under several forms:
 ------------
 Joe Developer <joe@example.com>
 Joe R. Developer <joe@example.com>
 Jane Doe <jane@example.com>
 Jane Doe <jane@laptop.(none)>
 Jane D. <jane@desktop.(none)>
 ------------
 Now suppose that Joe wants his middle name initial used, and Jane
 prefers her family name fully spelled out. A `.mailmap` file to
 correct the names would look like:
 ------------
 Joe R. Developer <joe@example.com>
 Jane Doe <jane@example.com>
 Jane Doe <jane@desktop.(none)>
 ------------
 Note that there's no need to map the name for '<jane&#64;laptop.(none)>' to
 only correct the names. However, leaving the obviously broken
 '<jane&#64;laptop.(none)>' and '<jane&#64;desktop.(none)>' E-Mails as-is is
 usually not what you want. A `.mailmap` file which also corrects those
 is:
 ------------
 Joe R. Developer <joe@example.com>
 Jane Doe <jane@example.com> <jane@laptop.(none)>
 Jane Doe <jane@example.com> <jane@desktop.(none)>
 ------------
 Finally, let's say that Joe and Jane shared an E-Mail address, but not
 a name, e.g. by having these two commits in the history generated by a
 bug reporting system. I.e. names appearing in history as:
 ------------
 Joe <bugs@example.com>
 Jane <bugs@example.com>
 ------------
 A full `.mailmap` file which also handles those cases (an addition of
 two lines to the above example) would be:
 ------------
 Joe R. Developer <joe@example.com>
 Jane Doe <jane@example.com> <jane@laptop.(none)>
 Jane Doe <jane@example.com> <jane@desktop.(none)>
 Joe R. Developer <joe@example.com> Joe <bugs@example.com>
 Jane Doe <jane@example.com> Jane <bugs@example.com>
 ------------
 SEE ALSO
 --------
 linkgit:git-check-mailmap[1]
 GIT
 ---
 Part of the linkgit:git[1] suite

2

Documentation/i18n.txt

View File

 @ -38,7 +38,7 @@ mind.
   a warning if the commit log message given to it does not look
   like a valid UTF-8 string, unless you explicitly say your
   project uses a legacy encoding.  The way to say this is to
   have i18n.commitencoding in `.git/config` file, like this:
   have `i18n.commitEncoding` in `.git/config` file, like this:
 +
 ------------
 [i18n]

75

Documentation/mailmap.txt

View File

 @ -1,75 +0,0 @@
 If the file `.mailmap` exists at the toplevel of the repository, or at
 the location pointed to by the mailmap.file or mailmap.blob
 configuration options, it
 is used to map author and committer names and email addresses to
 canonical real names and email addresses.
 In the simple form, each line in the file consists of the canonical
 real name of an author, whitespace, and an email address used in the
 commit (enclosed by '<' and '>') to map to the name. For example:
 --
 	Proper Name <commit@email.xx>
 --
 The more complex forms are:
 --
 	<proper@email.xx> <commit@email.xx>
 --
 which allows mailmap to replace only the email part of a commit, and:
 --
 	Proper Name <proper@email.xx> <commit@email.xx>
 --
 which allows mailmap to replace both the name and the email of a
 commit matching the specified commit email address, and:
 --
 	Proper Name <proper@email.xx> Commit Name <commit@email.xx>
 --
 which allows mailmap to replace both the name and the email of a
 commit matching both the specified commit name and email address.
 Example 1: Your history contains commits by two authors, Jane
 and Joe, whose names appear in the repository under several forms:
 ------------
 Joe Developer <joe@example.com>
 Joe R. Developer <joe@example.com>
 Jane Doe <jane@example.com>
 Jane Doe <jane@laptop.(none)>
 Jane D. <jane@desktop.(none)>
 ------------
 Now suppose that Joe wants his middle name initial used, and Jane
 prefers her family name fully spelled out. A proper `.mailmap` file
 would look like:
 ------------
 Jane Doe         <jane@desktop.(none)>
 Joe R. Developer <joe@example.com>
 ------------
 Note how there is no need for an entry for `<jane@laptop.(none)>`, because the
 real name of that author is already correct.
 Example 2: Your repository contains commits from the following
 authors:
 ------------
 nick1 <bugs@company.xx>
 nick2 <bugs@company.xx>
 nick2 <nick2@company.xx>
 santa <me@company.xx>
 claus <me@company.xx>
 CTO <cto@coompany.xx>
 ------------
 Then you might want a `.mailmap` file that looks like:
 ------------
 <cto@company.xx>                       <cto@coompany.xx>
 Some Dude <some@dude.xx>         nick1 <bugs@company.xx>
 Other Author <other@author.xx>   nick2 <bugs@company.xx>
 Other Author <other@author.xx>         <nick2@company.xx>
 Santa Claus <santa.claus@northpole.xx> <me@company.xx>
 ------------
 Use hash '#' for comments that are either on their own line, or after
 the email address.

34

Documentation/pretty-formats.txt

View File

 @ -252,7 +252,15 @@ endif::git-rev-list[]
 			  interpreted by
 			  linkgit:git-interpret-trailers[1]. The
 			  `trailers` string may be followed by a colon
 			  and zero or more comma-separated options:
 			  and zero or more comma-separated options.
 			  If any option is provided multiple times the
 			  last occurance wins.
 +
 The boolean options accept an optional value `[=<BOOL>]`. The values
 `true`, `false`, `on`, `off` etc. are all accepted. See the "boolean"
 sub-section in "EXAMPLES" in linkgit:git-config[1]. If a boolean
 option is given with no value, it's enabled.
 +
 ** 'key=<K>': only show trailers with specified key. Matching is done
    case-insensitively and trailing colon is optional. If option is
    given multiple times trailer lines matching any of the keys are
 @ -261,27 +269,25 @@ endif::git-rev-list[]
    desired it can be disabled with `only=false`.  E.g.,
    `%(trailers:key=Reviewed-by)` shows trailer lines with key
    `Reviewed-by`.
 ** 'only[=val]': select whether non-trailer lines from the trailer
    block should be included. The `only` keyword may optionally be
    followed by an equal sign and one of `true`, `on`, `yes` to omit or
    `false`, `off`, `no` to show the non-trailer lines. If option is
    given without value it is enabled. If given multiple times the last
    value is used.
 ** 'only[=<BOOL>]': select whether non-trailer lines from the trailer
    block should be included.
 ** 'separator=<SEP>': specify a separator inserted between trailer
    lines. When this option is not given each trailer line is
    terminated with a line feed character. The string SEP may contain
    the literal formatting codes described above. To use comma as
    separator one must use `%x2C` as it would otherwise be parsed as
    next option. If separator option is given multiple times only the
    last one is used. E.g., `%(trailers:key=Ticket,separator=%x2C )`
    next option. E.g., `%(trailers:key=Ticket,separator=%x2C )`
    shows all trailer lines whose key is "Ticket" separated by a comma
    and a space.
 ** 'unfold[=val]': make it behave as if interpret-trailer's `--unfold`
    option was given. In same way as to for `only` it can be followed
    by an equal sign and explicit value. E.g.,
 ** 'unfold[=<BOOL>]': make it behave as if interpret-trailer's `--unfold`
    option was given. E.g.,
    `%(trailers:only,unfold=true)` unfolds and shows all trailer lines.
 ** 'valueonly[=val]': skip over the key part of the trailer line and only
    show the value part. Also this optionally allows explicit value.
 ** 'keyonly[=<BOOL>]': only show the key part of the trailer.
 ** 'valueonly[=<BOOL>]': only show the value part of the trailer.
 ** 'key_value_separator=<SEP>': specify a separator inserted between
    trailer lines. When this option is not given each trailer key-value
    pair is separated by ": ". Otherwise it shares the same semantics
    as 'separator=<SEP>' above.
 NOTE: Some placeholders may depend on other options given to the
 revision traversal engine. For example, the `%g*` reflog options will

14

Documentation/rev-list-options.txt

View File

 @ -129,6 +129,11 @@ parents) and `--max-parents=-1` (negative numbers denote no upper limit).
 	adjusting to updated upstream from time to time, and
 	this option allows you to ignore the individual commits
 	brought in to your history by such a merge.
 ifdef::git-log[]
 +
 This option also changes default diff format for merge commits
 to `first-parent`, see `--diff-merges=first-parent` for details.
 endif::git-log[]
 --not::
 	Reverses the meaning of the '{caret}' prefix (or lack thereof)
 @ -222,6 +227,15 @@ ifdef::git-rev-list[]
 	test the exit status to see if a range of objects is fully
 	connected (or not).  It is faster than redirecting stdout
 	to `/dev/null` as the output does not have to be formatted.
 --disk-usage::
 	Suppress normal output; instead, print the sum of the bytes used
 	for on-disk storage by the selected commits or objects. This is
 	equivalent to piping the output into `git cat-file
 	--batch-check='%(objectsize:disk)'`, except that it runs much
 	faster (especially with `--use-bitmap-index`). See the `CAVEATS`
 	section in linkgit:git-cat-file[1] for the limitations of what
 	"on-disk storage" means.
 endif::git-rev-list[]
 --cherry-mark::

116

Documentation/technical/chunk-format.txt Normal file

View File

 @ -0,0 +1,116 @@
 Chunk-based file formats
 ========================
 Some file formats in Git use a common concept of "chunks" to describe
 sections of the file. This allows structured access to a large file by
 scanning a small "table of contents" for the remaining data. This common
 format is used by the `commit-graph` and `multi-pack-index` files. See
 link:technical/pack-format.html[the `multi-pack-index` format] and
 link:technical/commit-graph-format.html[the `commit-graph` format] for
 how they use the chunks to describe structured data.
 A chunk-based file format begins with some header information custom to
 that format. That header should include enough information to identify
 the file type, format version, and number of chunks in the file. From this
 information, that file can determine the start of the chunk-based region.
 The chunk-based region starts with a table of contents describing where
 each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
 where C is the number of chunks. Consider the following table:
   | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
   |--------------------|------------------------|
   | ID[0]              | OFFSET[0]              |
   | ...                | ...                    |
   | ID[C]              | OFFSET[C]              |
   | 0x0000             | OFFSET[C+1]            |
 Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
 Each integer is stored in network-byte order.
 The chunk identifier `ID[i]` is a label for the data stored within this
 fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
 size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
 and `OFFSET[i]`. This requires that the chunk data appears contiguously
 in the same order as the table of contents.
 The final entry in the table of contents must be four zero bytes. This
 confirms that the table of contents is ending and provides the offset for
 the end of the chunk-based data.
 Note: The chunk-based format expects that the file contains _at least_ a
 trailing hash after `OFFSET[C+1]`.
 Functions for working with chunk-based file formats are declared in
 `chunk-format.h`. Using these methods provide extra checks that assist
 developers when creating new file formats.
 Writing chunk-based file formats
 --------------------------------
 To write a chunk-based file format, create a `struct chunkfile` by
 calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
 caller is responsible for opening the `hashfile` and writing header
 information so the file format is identifiable before the chunk-based
 format begins.
 Then, call `add_chunk()` for each chunk that is intended for write. This
 populates the `chunkfile` with information about the order and size of
 each chunk to write. Provide a `chunk_write_fn` function pointer to
 perform the write of the chunk data upon request.
 Call `write_chunkfile()` to write the table of contents to the `hashfile`
 followed by each of the chunks. This will verify that each chunk wrote
 the expected amount of data so the table of contents is correct.
 Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
 caller is responsible for finalizing the `hashfile` by writing the trailing
 hash and closing the file.
 Reading chunk-based file formats
 --------------------------------
 To read a chunk-based file format, the file must be opened as a
 memory-mapped region. The chunk-format API expects that the entire file
 is mapped as a contiguous memory region.
 Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
 After reading the header information from the beginning of the file,
 including the chunk count, call `read_table_of_contents()` to populate
 the `struct chunkfile` with the list of chunks, their offsets, and their
 sizes.
 Extract the data information for each chunk using `pair_chunk()` or
 `read_chunk()`:
 * `pair_chunk()` assigns a given pointer with the location inside the
   memory-mapped file corresponding to that chunk's offset. If the chunk
   does not exist, then the pointer is not modified.
 * `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
   with the appropriate initial pointer and size information. The function
   is not called if the chunk does not exist. Use this method to read chunks
   if you need to perform immediate parsing or if you need to execute logic
   based on the size of the chunk.
 After calling these methods, call `free_chunkfile()` to clear the
 `struct chunkfile` data. This will not close the memory-mapped region.
 Callers are expected to own that data for the timeframe the pointers into
 the region are needed.
 Examples
 --------
 These file formats use the chunk-format API, and can be used as examples
 for future formats:
 * *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
   in `commit-graph.c` for how the chunk-format API is used to write and
   parse the commit-graph file format documented in
   link:technical/commit-graph-format.html[the commit-graph file format].
 * *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
   in `midx.c` for how the chunk-format API is used to write and
   parse the multi-pack-index file format documented in
   link:technical/pack-format.html[the multi-pack-index file format].

31

Documentation/technical/commit-graph-format.txt

View File

 @ -4,11 +4,7 @@ Git commit graph format
 The Git commit graph stores a list of commit OIDs and some associated
 metadata, including:
 - The generation number of the commit. Commits with no parents have
   generation number 1; commits with parents have generation number
   one more than the maximum generation number of its parents. We
   reserve zero as special, and can be used to mark a generation
   number invalid or as "not computed".
 - The generation number of the commit.
 - The root tree OID.
 @ -65,6 +61,9 @@ CHUNK LOOKUP:
       the length using the next chunk position if necessary.) Each chunk
       ID appears at most once.
   The CHUNK LOOKUP matches the table of contents from
   link:technical/chunk-format.html[the chunk-based file format].
   The remaining data in the body is described one chunk at a time, and
   these chunks may be given in any order. Chunks are required unless
   otherwise specified.
 @ -86,13 +85,33 @@ CHUNK DATA:
       position. If there are more than two parents, the second value
       has its most-significant bit on and the other bits store an array
       position into the Extra Edge List chunk.
     * The next 8 bytes store the generation number of the commit and
     * The next 8 bytes store the topological level (generation number v1)
       of the commit and
       the commit time in seconds since EPOCH. The generation number
       uses the higher 30 bits of the first 4 bytes, while the commit
       time uses the 32 bits of the second 4 bytes, along with the lowest
 bits of the lowest byte, storing the 33rd and 34th bit of the
       commit time.
   Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
     * This list of 4-byte values store corrected commit date offsets for the
       commits, arranged in the same order as commit data chunk.
     * If the corrected commit date offset cannot be stored within 31 bits,
       the value has its most-significant bit on and the other bits store
       the position of corrected commit date into the Generation Data Overflow
       chunk.
     * Generation Data chunk is present only when commit-graph file is written
       by compatible versions of Git and in case of split commit-graph chains,
       the topmost layer also has Generation Data chunk.
   Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
     * This list of 8-byte values stores the corrected commit date offsets
       for commits with corrected commit date offsets that cannot be
       stored within 31 bits.
     * Generation Data Overflow chunk is present only when Generation Data
       chunk is present and atleast one corrected commit date offset cannot
       be stored within 31 bits.
   Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
       This list of 4-byte values store the second through nth parents for
       all octopus merges. The second parent value in the commit data stores

77

Documentation/technical/commit-graph.txt

View File

 @ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
 Values 1-4 satisfy the requirements of parse_commit_gently().
 Define the "generation number" of a commit recursively as follows:
 There are two definitions of generation number:
 . Corrected committer dates (generation number v2)
 . Topological levels (generation nummber v1)
  * A commit with no parents (a root commit) has generation number one.
 Define "corrected committer date" of a commit recursively as follows:
  * A commit with at least one parent has generation number one more than
    the largest generation number among its parents.
  * A commit with no parents (a root commit) has corrected committer date
     equal to its committer date.
 Equivalently, the generation number of a commit A is one more than the
  * A commit with at least one parent has corrected committer date equal to
     the maximum of its commiter date and one more than the largest corrected
     committer date among its parents.
  * As a special case, a root commit with timestamp zero has corrected commit
     date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
     (that is, an uncomputed corrected commit date).
 Define the "topological level" of a commit recursively as follows:
  * A commit with no parents (a root commit) has topological level of one.
  * A commit with at least one parent has topological level one more than
    the largest topological level among its parents.
 Equivalently, the topological level of a commit A is one more than the
 length of a longest path from A to a root commit. The recursive definition
 is easier to use for computation and observing the following property:
 @ -60,6 +77,9 @@ is easier to use for computation and observing the following property:
     generation numbers, then we always expand the boundary commit with highest
     generation number and can easily detect the stopping condition.
 The property applies to both versions of generation number, that is both
 corrected committer dates and topological levels.
 This property can be used to significantly reduce the time it takes to
 walk commits and determine topological relationships. Without generation
 numbers, the general heuristic is the following:
 @ -67,7 +87,9 @@ numbers, the general heuristic is the following:
     If A and B are commits with commit time X and Y, respectively, and
     X < Y, then A _probably_ cannot reach B.
 This heuristic is currently used whenever the computation is allowed to
 In absence of corrected commit dates (for example, old versions of Git or
 mixed generation graph chains),
 this heuristic is currently used whenever the computation is allowed to
 violate topological relationships due to clock skew (such as "git log"
 with default order), but is not used when the topological order is
 required (such as merge base calculations, "git log --graph").
 @ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite"
 generation number and walk until reaching commits with known generation
 number.
 We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
 We use the macro GENERATION_NUMBER_INFINITY to mark commits not
 in the commit-graph file. If a commit-graph file was written by a version
 of Git that did not compute generation numbers, then those commits will
 have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
 @ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in
 walking a few extra commits, but the simplicity in dealing with commits
 with generation number *_INFINITY or *_ZERO is valuable.
 We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
 generation numbers are computed to be at least this value. We limit at
 this value since it is the largest value that can be stored in the
 commit-graph file using the 30 bits available to generation numbers. This
 presents another case where a commit can have generation number equal to
 that of a parent.
 We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose
 topological levels (generation number v1) are computed to be at least
 this value. We limit at this value since it is the largest value that
 can be stored in the commit-graph file using the 30 bits available
 to topological levels. This presents another case where a commit can
 have generation number equal to that of a parent.
 Design Details
 --------------
 @ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
 number of commits) could be extracted into config settings for full
 flexibility.
 ## Handling Mixed Generation Number Chains
 With the introduction of generation number v2 and generation data chunk, the
 following scenario is possible:
 . "New" Git writes a commit-graph with the corrected commit dates.
 . "Old" Git writes a split commit-graph on top without corrected commit dates.
 A naive approach of using the newest available generation number from
 each layer would lead to violated expectations: the lower layer would
 use corrected commit dates which are much larger than the topological
 levels of the higher layer. For this reason, Git inspects the topmost
 layer to see if the layer is missing corrected commit dates. In such a case
 Git only uses topological level for generation numbers.
 When writing a new layer in split commit-graph, we write corrected commit
 dates if the topmost layer has corrected commit dates written. This
 guarantees that if a layer has corrected commit dates, all lower layers
 must have corrected commit dates as well.
 When merging layers, we do not consider whether the merged layers had corrected
 commit dates. Instead, the new layer will have corrected commit dates if the
 layer below the new layer has corrected commit dates.
 While writing or merging layers, if the new layer is the only layer, it will
 have corrected commit dates when written by compatible versions of Git. Thus,
 rewriting split commit-graph as a single file (`--split=replace`) creates a
 single layer with corrected commit dates.
 ## Deleting graph-{hash} files
 After a new tip file is written, some `graph-{hash}` files may no longer

293

Documentation/technical/hash-function-transition.txt

View File

 @ -33,16 +33,9 @@ researchers. On 23 February 2017 the SHAttered attack
 Git v2.13.0 and later subsequently moved to a hardened SHA-1
 implementation by default, which isn't vulnerable to the SHAttered
 attack.
 attack, but SHA-1 is still weak.
 Thus Git has in effect already migrated to a new hash that isn't SHA-1
 and doesn't share its vulnerabilities, its new hash function just
 happens to produce exactly the same output for all known inputs,
 except two PDFs published by the SHAttered researchers, and the new
 implementation (written by those researchers) claims to detect future
 cryptanalytic collision attacks.
 Regardless, it's considered prudent to move past any variant of SHA-1
 Thus it's considered prudent to move past any variant of SHA-1
 to a new hash. There's no guarantee that future attacks on SHA-1 won't
 be published in the future, and those attacks may not have viable
 mitigations.
 @ -57,6 +50,38 @@ SHA-1 still possesses the other properties such as fast object lookup
 and safe error checking, but other hash functions are equally suitable
 that are believed to be cryptographically secure.
 Choice of Hash
 --------------
 The hash to replace the hardened SHA-1 should be stronger than SHA-1
 was: we would like it to be trustworthy and useful in practice for at
 least 10 years.
 Some other relevant properties:
 . A 256-bit hash (long enough to match common security practice; not
    excessively long to hurt performance and disk usage).
 . High quality implementations should be widely available (e.g., in
    OpenSSL and Apple CommonCrypto).
 . The hash function's properties should match Git's needs (e.g. Git
    requires collision and 2nd preimage resistance and does not require
    length extension resistance).
 . As a tiebreaker, the hash should be fast to compute (fortunately
    many contenders are faster than SHA-1).
 There were several contenders for a successor hash to SHA-1, including
 SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256.
 In late 2018 the project picked SHA-256 as its successor hash.
 See 0ed8d8da374 (doc hash-function-transition: pick SHA-256 as
 NewHash, 2018-08-04) and numerous mailing list threads at the time,
 particularly the one starting at
 https://lore.kernel.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
 for more information.
 Goals
 -----
 . The transition to SHA-256 can be done one local repository at a time.
 @ -94,7 +119,7 @@ Overview
 --------
 We introduce a new repository format extension. Repositories with this
 extension enabled use SHA-256 instead of SHA-1 to name their objects.
 This affects both object names and object content --- both the names
 This affects both object names and object content -- both the names
 of objects and all references to other objects within an object are
 switched to the new hash function.
 @ -107,7 +132,7 @@ mapping to allow naming objects using either their SHA-1 and SHA-256 names
 interchangeably.
 "git cat-file" and "git hash-object" gain options to display an object
 in its sha1 form and write an object given its sha1 form. This
 in its SHA-1 form and write an object given its SHA-1 form. This
 requires all objects referenced by that object to be present in the
 object database so that they can be named using the appropriate name
 (using the bidirectional hash mapping).
 @ -115,7 +140,7 @@ object database so that they can be named using the appropriate name
 Fetches from a SHA-1 based server convert the fetched objects into
 SHA-256 form and record the mapping in the bidirectional mapping table
 (see below for details). Pushes to a SHA-1 based server convert the
 objects being pushed into sha1 form so the server does not have to be
 objects being pushed into SHA-1 form so the server does not have to be
 aware of the hash function the client is using.
 Detailed Design
 @ -151,38 +176,38 @@ repository extensions.
 Object names
 ~~~~~~~~~~~~
 Objects can be named by their 40 hexadecimal digit sha1-name or 64
 hexadecimal digit sha256-name, plus names derived from those (see
 Objects can be named by their 40 hexadecimal digit SHA-1 name or 64
 hexadecimal digit SHA-256 name, plus names derived from those (see
 gitrevisions(7)).
 The sha1-name of an object is the SHA-1 of the concatenation of its
 type, length, a nul byte, and the object's sha1-content. This is the
 The SHA-1 name of an object is the SHA-1 of the concatenation of its
 type, length, a nul byte, and the object's SHA-1 content. This is the
 traditional <sha1> used in Git to name objects.
 The sha256-name of an object is the SHA-256 of the concatenation of its
 type, length, a nul byte, and the object's sha256-content.
 The SHA-256 name of an object is the SHA-256 of the concatenation of its
 type, length, a nul byte, and the object's SHA-256 content.
 Object format
 ~~~~~~~~~~~~~
 The content as a byte sequence of a tag, commit, or tree object named
 by sha1 and sha256 differ because an object named by sha256-name refers to
 other objects by their sha256-names and an object named by sha1-name
 refers to other objects by their sha1-names.
 by SHA-1 and SHA-256 differ because an object named by SHA-256 name refers to
 other objects by their SHA-256 names and an object named by SHA-1 name
 refers to other objects by their SHA-1 names.
 The sha256-content of an object is the same as its sha1-content, except
 that objects referenced by the object are named using their sha256-names
 instead of sha1-names. Because a blob object does not refer to any
 other object, its sha1-content and sha256-content are the same.
 The SHA-256 content of an object is the same as its SHA-1 content, except
 that objects referenced by the object are named using their SHA-256 names
 instead of SHA-1 names. Because a blob object does not refer to any
 other object, its SHA-1 content and SHA-256 content are the same.
 The format allows round-trip conversion between sha256-content and
 sha1-content.
 The format allows round-trip conversion between SHA-256 content and
 SHA-1 content.
 Object storage
 ~~~~~~~~~~~~~~
 Loose objects use zlib compression and packed objects use the packed
 format described in Documentation/technical/pack-format.txt, just like
 today. The content that is compressed and stored uses sha256-content
 instead of sha1-content.
 today. The content that is compressed and stored uses SHA-256 content
 instead of SHA-1 content.
 Pack index
 ~~~~~~~~~~
 @ -191,21 +216,21 @@ hash functions. They have the following format (all integers are in
 network byte order):
 - A header appears at the beginning and consists of the following:
   - The 4-byte pack index signature: '\377t0c'
   - 4-byte version number: 3
   - 4-byte length of the header section, including the signature and
   * The 4-byte pack index signature: '\377t0c'
   * 4-byte version number: 3
   * 4-byte length of the header section, including the signature and
     version number
   - 4-byte number of objects contained in the pack
   - 4-byte number of object formats in this pack index: 2
   - For each object format:
     - 4-byte format identifier (e.g., 'sha1' for SHA-1)
     - 4-byte length in bytes of shortened object names. This is the
   * 4-byte number of objects contained in the pack
   * 4-byte number of object formats in this pack index: 2
   * For each object format:
     ** 4-byte format identifier (e.g., 'sha1' for SHA-1)
     ** 4-byte length in bytes of shortened object names. This is the
       shortest possible length needed to make names in the shortened
       object name table unambiguous.
     - 4-byte integer, recording where tables relating to this format
     ** 4-byte integer, recording where tables relating to this format
       are stored in this index file, as an offset from the beginning.
   - 4-byte offset to the trailer from the beginning of this file.
   - Zero or more additional key/value pairs (4-byte key, 4-byte
   * 4-byte offset to the trailer from the beginning of this file.
   * Zero or more additional key/value pairs (4-byte key, 4-byte
     value). Only one key is supported: 'PSRC'. See the "Loose objects
     and unreachable objects" section for supported values and how this
     is used.  All other keys are reserved. Readers must ignore
 @ -213,37 +238,36 @@ network byte order):
 - Zero or more NUL bytes. This can optionally be used to improve the
   alignment of the full object name table below.
 - Tables for the first object format:
   - A sorted table of shortened object names.  These are prefixes of
   * A sorted table of shortened object names.  These are prefixes of
     the names of all objects in this pack file, packed together
     without offset values to reduce the cache footprint of the binary
     search for a specific object name.
   - A table of full object names in pack order. This allows resolving
   * A table of full object names in pack order. This allows resolving
     a reference to "the nth object in the pack file" (from a
     reachability bitmap or from the next table of another object
     format) to its object name.
   - A table of 4-byte values mapping object name order to pack order.
   * A table of 4-byte values mapping object name order to pack order.
     For an object in the table of sorted shortened object names, the
     value at the corresponding index in this table is the index in the
     previous table for that same object.
     This can be used to look up the object in reachability bitmaps or
     to look up its name in another object format.
   - A table of 4-byte CRC32 values of the packed object data, in the
   * A table of 4-byte CRC32 values of the packed object data, in the
     order that the objects appear in the pack file. This is to allow
     compressed data to be copied directly from pack to pack during
     repacking without undetected data corruption.
   - A table of 4-byte offset values. For an object in the table of
   * A table of 4-byte offset values. For an object in the table of
     sorted shortened object names, the value at the corresponding
     index in this table indicates where that object can be found in
     the pack file. These are usually 31-bit pack file offsets, but
     large offsets are encoded as an index into the next table with the
     most significant bit set.
   - A table of 8-byte offset entries (empty for pack files less than
   * A table of 8-byte offset entries (empty for pack files less than
 GiB). Pack files are organized with heavily used objects toward
     the front, so most object references should not need to refer to
     this table.
 @ -252,10 +276,10 @@ network byte order):
   up to and not including the table of CRC32 values.
 - Zero or more NUL bytes.
 - The trailer consists of the following:
   - A copy of the 20-byte SHA-256 checksum at the end of the
   * A copy of the 20-byte SHA-256 checksum at the end of the
     corresponding packfile.
   - 20-byte SHA-256 checksum of all of the above.
   * 20-byte SHA-256 checksum of all of the above.
 Loose object index
 ~~~~~~~~~~~~~~~~~~
 @ -288,18 +312,18 @@ To remove entries (e.g. in "git pack-refs" or "git-prune"):
 Translation table
 ~~~~~~~~~~~~~~~~~
 The index files support a bidirectional mapping between sha1-names
 and sha256-names. The lookup proceeds similarly to ordinary object
 lookups. For example, to convert a sha1-name to a sha256-name:
 The index files support a bidirectional mapping between SHA-1 names
 and SHA-256 names. The lookup proceeds similarly to ordinary object
 lookups. For example, to convert a SHA-1 name to a SHA-256 name:
 . Look for the object in idx files. If a match is present in the
     idx's sorted list of truncated sha1-names, then:
     a. Read the corresponding entry in the sha1-name order to pack
     idx's sorted list of truncated SHA-1 names, then:
     a. Read the corresponding entry in the SHA-1 name order to pack
        name order mapping.
     b. Read the corresponding entry in the full sha1-name table to
     b. Read the corresponding entry in the full SHA-1 name table to
        verify we found the right object. If it is, then
     c. Read the corresponding entry in the full sha256-name table.
        That is the object's sha256-name.
     c. Read the corresponding entry in the full SHA-256 name table.
        That is the object's SHA-256 name.
 . Check for a loose object. Read lines from loose-object-idx until
     we find a match.
 @ -313,10 +337,10 @@ Since all operations that make new objects (e.g., "git commit") add
 the new objects to the corresponding index, this mapping is possible
 for all objects in the object store.
 Reading an object's sha1-content
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The sha1-content of an object can be read by converting all sha256-names
 its sha256-content references to sha1-names using the translation table.
 Reading an object's SHA-1 content
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The SHA-1 content of an object can be read by converting all SHA-256 names
 of its SHA-256 content references to SHA-1 names using the translation table.
 Fetch
 ~~~~~
 @ -339,7 +363,7 @@ the following steps:
 . index-pack: inflate each object in the packfile and compute its
    SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
    objects the client has locally. These objects can be looked up
    using the translation table and their sha1-content read as
    using the translation table and their SHA-1 content read as
    described above to resolve the deltas.
 . topological sort: starting at the "want"s from the negotiation
    phase, walk through objects in the pack and emit a list of them,
 @ -348,12 +372,12 @@ the following steps:
    (This list only contains objects reachable from the "wants". If the
    pack from the server contained additional extraneous objects, then
    they will be discarded.)
 . convert to sha256: open a new (sha256) packfile. Read the topologically
 . convert to SHA-256: open a new SHA-256 packfile. Read the topologically
    sorted list just generated. For each object, inflate its
    sha1-content, convert to sha256-content, and write it to the sha256
    pack. Record the new sha1<->sha256 mapping entry for use in the idx.
    SHA-1 content, convert to SHA-256 content, and write it to the SHA-256
    pack. Record the new SHA-1<-->SHA-256 mapping entry for use in the idx.
 . sort: reorder entries in the new pack to match the order of objects
    in the pack the server generated and include blobs. Write a sha256 idx
    in the pack the server generated and include blobs. Write a SHA-256 idx
    file
 . clean up: remove the SHA-1 based pack file, index, and
    topologically sorted list obtained from the server in steps 1
 @ -378,19 +402,20 @@ experimenting to get this to perform well.
 Push
 ~~~~
 Push is simpler than fetch because the objects referenced by the
 pushed objects are already in the translation table. The sha1-content
 pushed objects are already in the translation table. The SHA-1 content
 of each object being pushed can be read as described in the "Reading
 an object's sha1-content" section to generate the pack written by git
 an object's SHA-1 content" section to generate the pack written by git
 send-pack.
 Signed Commits
 ~~~~~~~~~~~~~~
 We add a new field "gpgsig-sha256" to the commit object format to allow
 signing commits without relying on SHA-1. It is similar to the
 existing "gpgsig" field. Its signed payload is the sha256-content of the
 existing "gpgsig" field. Its signed payload is the SHA-256 content of the
 commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
 This means commits can be signed
 . using SHA-1 only, as in existing signed commit objects
 . using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
    fields.
 @ -404,10 +429,11 @@ Signed Tags
 ~~~~~~~~~~~
 We add a new field "gpgsig-sha256" to the tag object format to allow
 signing tags without relying on SHA-1. Its signed payload is the
 sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
 SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
 SIGNATURE-----" delimited in-body signature removed.
 This means tags can be signed
 . using SHA-1 only, as in existing signed tag objects
 . using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
    signature.
 @ -415,11 +441,11 @@ This means tags can be signed
 Mergetag embedding
 ~~~~~~~~~~~~~~~~~~
 The mergetag field in the sha1-content of a commit contains the
 sha1-content of a tag that was merged by that commit.
 The mergetag field in the SHA-1 content of a commit contains the
 SHA-1 content of a tag that was merged by that commit.
 The mergetag field in the sha256-content of the same commit contains the
 sha256-content of the same tag.
 The mergetag field in the SHA-256 content of the same commit contains the
 SHA-256 content of the same tag.
 Submodules
 ~~~~~~~~~~
 @ -494,7 +520,7 @@ Caveats
 -------
 Invalid objects
 ~~~~~~~~~~~~~~~
 The conversion from sha1-content to sha256-content retains any
 The conversion from SHA-1 content to SHA-256 content retains any
 brokenness in the original object (e.g., tree entry modes encoded with
 leading 0, tree objects whose paths are not sorted correctly, and
 commit objects without an author or committer). This is a deliberate
 @ -513,15 +539,15 @@ allow lifting this restriction.
 Alternates
 ~~~~~~~~~~
 For the same reason, a sha256 repository cannot borrow objects from a
 sha1 repository using objects/info/alternates or
 For the same reason, a SHA-256 repository cannot borrow objects from a
 SHA-1 repository using objects/info/alternates or
 $GIT_ALTERNATE_OBJECT_REPOSITORIES.
 git notes
 ~~~~~~~~~
 The "git notes" tool annotates objects using their sha1-name as key.
 The "git notes" tool annotates objects using their SHA-1 name as key.
 This design does not describe a way to migrate notes trees to use
 sha256-names. That migration is expected to happen separately (for
 SHA-256 names. That migration is expected to happen separately (for
 example using a file at the root of the notes tree to describe which
 hash it uses).
 @ -555,7 +581,7 @@ unclear:
 	Git 2.12
 Does this mean Git v2.12.0 is the commit with sha1-name
 Does this mean Git v2.12.0 is the commit with SHA-1 name
 e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
 new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
 @ -598,44 +624,12 @@ The user can also explicitly specify which format to use for a
 particular revision specifier and for output, overriding the mode. For
 example:
 git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
 Choice of Hash
 --------------
 In early 2005, around the time that Git was written, Xiaoyun Wang,
 Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1
 collisions in 2^69 operations. In August they published details.
 Luckily, no practical demonstrations of a collision in full SHA-1 were
 published until 10 years later, in 2017.
 Git v2.13.0 and later subsequently moved to a hardened SHA-1
 implementation by default that mitigates the SHAttered attack, but
 SHA-1 is still believed to be weak.
 The hash to replace this hardened SHA-1 should be stronger than SHA-1
 was: we would like it to be trustworthy and useful in practice for at
 least 10 years.
 Some other relevant properties:
 . A 256-bit hash (long enough to match common security practice; not
    excessively long to hurt performance and disk usage).
 . High quality implementations should be widely available (e.g., in
    OpenSSL and Apple CommonCrypto).
 . The hash function's properties should match Git's needs (e.g. Git
    requires collision and 2nd preimage resistance and does not require
    length extension resistance).
 . As a tiebreaker, the hash should be fast to compute (fortunately
    many contenders are faster than SHA-1).
 We choose SHA-256.
     git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
 Transition plan
 ---------------
 Some initial steps can be implemented independently of one another:
 - adding a hash function API (vtable)
 - teaching fsck to tolerate the gpgsig-sha256 field
 - excluding gpgsig-* from the fields copied by "git commit --amend"
 @ -647,9 +641,9 @@ Some initial steps can be implemented independently of one another:
 - introducing index v3
 - adding support for the PSRC field and safer object pruning
 The first user-visible change is the introduction of the objectFormat
 extension (without compatObjectFormat). This requires:
 - teaching fsck about this mode of operation
 - using the hash function API (vtable) when computing object names
 - signing objects and verifying signatures
 @ -657,6 +651,7 @@ extension (without compatObjectFormat). This requires:
   repository
 Next comes introduction of compatObjectFormat:
 - implementing the loose-object-idx
 - translating object names between object formats
 - translating object content between object formats
 @ -669,10 +664,11 @@ Next comes introduction of compatObjectFormat:
   "Object names on the command line" above)
 The next step is supporting fetches and pushes to SHA-1 repositories:
 - allow pushes to a repository using the compat format
 - generate a topologically sorted list of the SHA-1 names of fetched
   objects
 - convert the fetched packfile to sha256 format and generate an idx
 - convert the fetched packfile to SHA-256 format and generate an idx
   file
 - re-sort to match the order of objects in the fetched packfile
 @ -734,6 +730,7 @@ Using hash functions in parallel
 Objects newly created would be addressed by the new hash, but inside
 such an object (e.g. commit) it is still possible to address objects
 using the old hash function.
 * You cannot trust its history (needed for bisectability) in the
   future without further work
 * Maintenance burden as the number of supported hash functions grows
 @ -743,36 +740,38 @@ using the old hash function.
 Signed objects with multiple hashes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Instead of introducing the gpgsig-sha256 field in commit and tag objects
 for sha256-content based signatures, an earlier version of this design
 added "hash sha256 <sha256-name>" fields to strengthen the existing
 sha1-content based signatures.
 for SHA-256 content based signatures, an earlier version of this design
 added "hash sha256 <SHA-256 name>" fields to strengthen the existing
 SHA-1 content based signatures.
 In other words, a single signature was used to attest to the object
 content using both hash functions. This had some advantages:
 * Using one signature instead of two speeds up the signing process.
 * Having one signed payload with both hashes allows the signer to
   attest to the sha1-name and sha256-name referring to the same object.
   attest to the SHA-1 name and SHA-256 name referring to the same object.
 * All users consume the same signature. Broken signatures are likely
   to be detected quickly using current versions of git.
 However, it also came with disadvantages:
 * Verifying a signed object requires access to the sha1-names of all
 * Verifying a signed object requires access to the SHA-1 names of all
   objects it references, even after the transition is complete and
   translation table is no longer needed for anything else. To support
   this, the design added fields such as "hash sha1 tree <sha1-name>"
   and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
   this, the design added fields such as "hash sha1 tree <SHA-1 name>"
   and "hash sha1 parent <SHA-1 name>" to the SHA-256 content of a signed
   commit, complicating the conversion process.
 * Allowing signed objects without a sha1 (for after the transition is
 * Allowing signed objects without a SHA-1 (for after the transition is
   complete) complicated the design further, requiring a "nohash sha1"
   field to suppress including "hash sha1" fields in the sha256-content
   field to suppress including "hash sha1" fields in the SHA-256 content
   and signed payload.
 Lazily populated translation table
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Some of the work of building the translation table could be deferred to
 push time, but that would significantly complicate and slow down pushes.
 Calculating the sha1-name at object creation time at the same time it is
 being streamed to disk and having its sha256-name calculated should be
 Calculating the SHA-1 name at object creation time at the same time it is
 being streamed to disk and having its SHA-256 name calculated should be
 an acceptable cost.
 Document History
 @ -782,18 +781,19 @@ Document History
 bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
 sbeller@google.com
 Initial version sent to
 http://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
 * Initial version sent to https://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
 -03-03 jrnieder@gmail.com
 Incorporated suggestions from jonathantanmy and sbeller:
 * describe purpose of signed objects with each hash type
 * redefine signed object verification using object content under the
 * Describe purpose of signed objects with each hash type
 * Redefine signed object verification using object content under the
   first hash function
 -03-06 jrnieder@gmail.com
 * Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
 * Make sha3-based signatures a separate field, avoiding the need for
 * Make SHA3-based signatures a separate field, avoiding the need for
   "hash" and "nohash" fields (thanks to peff[3]).
 * Add a sorting phase to fetch (thanks to Junio for noticing the need
   for this).
 @ -805,23 +805,26 @@ Incorporated suggestions from jonathantanmy and sbeller:
   especially Junio).
 -09-27 jrnieder@gmail.com, sbeller@google.com
 * use placeholder NewHash instead of SHA3-256
 * describe criteria for picking a hash function.
 * include a transition plan (thanks especially to Brandon Williams
 * Use placeholder NewHash instead of SHA3-256
 * Describe criteria for picking a hash function.
 * Include a transition plan (thanks especially to Brandon Williams
   for fleshing these ideas out)
 * define the translation table (thanks, Shawn Pearce[5], Jonathan
 * Define the translation table (thanks, Shawn Pearce[5], Jonathan
   Tan, and Masaya Suzuki)
 * avoid loose object overhead by packing more aggressively in
 * Avoid loose object overhead by packing more aggressively in
   "git gc --auto"
 Later history:
  See the history of this file in git.git for the history of subsequent
  edits. This document history is no longer being maintained as it
  would now be superfluous to the commit log
 * See the history of this file in git.git for the history of subsequent
   edits. This document history is no longer being maintained as it
   would now be superfluous to the commit log
 [1] http://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
 [2] http://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
 [3] http://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
 [4] http://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
 [5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
 References:
  [1] https://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
  [2] https://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
  [3] https://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
  [4] https://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
  [5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/

42

Documentation/technical/index-format.txt

View File

 @ -26,7 +26,7 @@ Git index format
      Extensions are identified by signature. Optional extensions can
      be ignored if Git does not understand them.
      Git currently supports cached tree and resolve undo extensions.
      Git currently supports cache tree and resolve undo extensions.
 -byte extension signature. If the first byte is 'A'..'Z' the
      extension is optional and can be ignored.
 @ -136,14 +136,35 @@ Git index format
 == Extensions
 === Cached tree
 === Cache tree
   Cached tree extension contains pre-computed hashes for trees that can
   be derived from the index. It helps speed up tree object generation
   from index for a new commit.
   Since the index does not record entries for directories, the cache
   entries cannot describe tree objects that already exist in the object
   database for regions of the index that are unchanged from an existing
   commit. The cache tree extension stores a recursive tree structure that
   describes the trees that already exist and completely match sections of
   the cache entries. This speeds up tree object generation from the index
   for a new commit by only computing the trees that are "new" to that
   commit. It also assists when comparing the index to another tree, such
   as `HEAD^{tree}`, since sections of the index can be skipped when a tree
   comparison demonstrates equality.
   When a path is updated in index, the path must be invalidated and
   removed from tree cache.
   The recursive tree structure uses nodes that store a number of cache
   entries, a list of subnodes, and an object ID (OID). The OID references
   the existing tree for that node, if it is known to exist. The subnodes
   correspond to subdirectories that themselves have cache tree nodes. The
   number of cache entries corresponds to the number of cache entries in
   the index that describe paths within that tree's directory.
   The extension tracks the full directory structure in the cache tree
   extension, but this is generally smaller than the full cache entry list.
   When a path is updated in index, Git invalidates all nodes of the
   recursive cache tree corresponding to the parent directories of that
   path. We store these tree nodes as being "invalid" by using "-1" as the
   number of cache entries. Invalid nodes still store a span of index
   entries, allowing Git to focus its efforts when reconstructing a full
   cache tree.
   The signature for this extension is { 'T', 'R', 'E', 'E' }.
 @ -174,7 +195,8 @@ Git index format
   first entry represents the root level of the repository, followed by the
   first subtree--let's call this A--of the root level (with its name
   relative to the root level), followed by the first subtree of A (with
   its name relative to A), ...
   its name relative to A), and so on. The specified number of subtrees
   indicates when the current level of the recursive stack is complete.
 === Resolve undo
 @ -251,14 +273,14 @@ Git index format
   - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
     ctime field until "file size".
   - Stat data of core.excludesfile
   - Stat data of core.excludesFile
   - 32-bit dir_flags (see struct dir_struct)
   - Hash of $GIT_DIR/info/exclude. A null hash means the file
     does not exist.
   - Hash of core.excludesfile. A null hash means the file does
   - Hash of core.excludesFile. A null hash means the file does
     not exist.
   - NUL-terminated string of per-dir exclude file name. This usually

23

Documentation/technical/pack-format.txt

View File

 @ -274,6 +274,26 @@ Pack file entry: <+
     Index checksum of all of the above.
 == pack-*.rev files have the format:
   - A 4-byte magic number '0x52494458' ('RIDX').
   - A 4-byte version identifier (= 1).
   - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
   - A table of index positions (one per packed object, num_objects in
     total, each a 4-byte unsigned integer in network order), sorted by
     their corresponding offsets in the packfile.
   - A trailer, containing a:
     checksum of the corresponding packfile, and
     a checksum of all of the above.
 All 4-byte numbers are in network order.
 == multi-pack-index (MIDX) files have the following format:
 The multi-pack-index files refer to multiple pack-files and loose objects.
 @ -316,6 +336,9 @@ CHUNK LOOKUP:
 	    (Chunks are provided in file-order, so you can infer the length
 	    using the next chunk position if necessary.)
 	The CHUNK LOOKUP matches the table of contents from
 	link:technical/chunk-format.html[the chunk-based file format].
 	The remaining data in the body is described one chunk at a time, and
 	these chunks may be given in any order. Chunks are required unless
 	otherwise specified.

15

Documentation/technical/protocol-v2.txt

View File

 @ -33,8 +33,8 @@ In protocol v2 these special packets will have the following semantics:
   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
   * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
   * '0002' Message Packet (response-end-pkt) - indicates the end of a response
     for stateless connections
   * '0002' Response End Packet (response-end-pkt) - indicates the end of a
     response for stateless connections
 Initial Client Request
 ----------------------
 @ -192,11 +192,20 @@ ls-refs takes in the following arguments:
 	When specified, only references having a prefix matching one of
 	the provided prefixes are displayed.
 If the 'unborn' feature is advertised the following argument can be
 included in the client's request.
     unborn
 	The server will send information about HEAD even if it is a symref
 	pointing to an unborn branch in the form "unborn HEAD
 	symref-target:<target>".
 The output of ls-refs is as follows:
     output = *ref
 	     flush-pkt
     ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
     obj-id-or-unborn = (obj-id | "unborn")
     ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
     ref-attribute = (symref | peeled)
     symref = "symref-target:" symref-target
     peeled = "peeled:" obj-id

42

Documentation/technical/reftable.txt

View File

 @ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable:
 Layout
 ^^^^^^
 A collection of reftable files are stored in the `$GIT_DIR/reftable/`
 directory:
 ....
 00000001-00000001.log
 00000002-00000002.ref
 00000003-00000003.ref
 ....
 where reftable files are named by a unique name such as produced by the
 function `${min_update_index}-${max_update_index}.ref`.
 A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
 Their names should have a random element, such that each filename is globally
 unique; this helps avoid spurious failures on Windows, where open files cannot
 be removed or overwritten. It suggested to use
 `${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
 Log-only files use the `.log` extension, while ref-only and mixed ref
 and log files use `.ref`. extension.
 @ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest
 ....
 $ cat .git/reftable/tables.list
 00000001-00000001.log
 00000002-00000002.ref
 00000003-00000003.ref
 00000001-00000001-RANDOM1.log
 00000002-00000002-RANDOM2.ref
 00000003-00000003-RANDOM3.ref
 ....
 Readers must read `$GIT_DIR/reftable/tables.list` to determine which
 @ -940,7 +934,7 @@ new reftable and atomically appending it to the stack:
 .  Select `update_index` to be most recent file's
 `max_update_index + 1`.
 .  Prepare temp reftable `tmp_XXXXXX`, including log entries.
 .  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
 .  Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
 .  Copy `tables.list` to `tables.list.lock`, appending file from (5).
 .  Rename `tables.list.lock` to `tables.list`.
 @ -993,7 +987,7 @@ prevents other processes from trying to compact these files.
 should always be the case, assuming that other processes are adhering to
 the locking protocol.
 .  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
 `${min_update_index}-${max_update_index}.ref`.
 `${min_update_index}-${max_update_index}-${random}.ref`.
 .  Write the new stack to `tables.list.lock`, replacing `B` and `C`
 with the file from (4).
 .  Rename `tables.list.lock` to `tables.list`.
 @ -1005,6 +999,22 @@ This strategy permits compactions to proceed independently of updates.
 Each reftable (compacted or not) is uniquely identified by its name, so
 open reftables can be cached by their name.
 Windows
 ^^^^^^^
 On windows, and other systems that do not allow deleting or renaming to open
 files, compaction may succeed, but other readers may prevent obsolete tables
 from being deleted.
 On these platforms, the following strategy can be followed: on closing a
 reftable stack, reload `tables.list`, and delete any tables no longer mentioned
 in `tables.list`.
 Irregular program exit may still leave about unused files. In this case, a
 cleanup operation can read `tables.list`, note its modification timestamp, and
 delete any unreferenced `*.ref` files that are older.
 Alternatives considered
 ~~~~~~~~~~~~~~~~~~~~~~~

2

GIT-VERSION-GEN

View File

 @ -1,7 +1,7 @@
 #!/bin/sh
 GVF=GIT-VERSION-FILE
 DEF_VER=v2.30.6
 DEF_VER=v2.31.7
 LF='
 '

									
										55

Makefile
									
												View File
												
				@ -22,6 +22,9 @@ all::

				# when attempting to read from an fopen'ed directory (or even to fopen

				# it at all).

				#

				# Define OPEN_RETURNS_EINTR if your open() system call may return EINTR

				# when a signal is received (as opposed to restarting).

				#

				# Define NO_OPENSSL environment variable if you do not have OpenSSL.

				#

				# Define USE_LIBPCRE if you have and want to use libpcre. Various

				@ -29,18 +32,11 @@ all::

				# Perl-compatible regular expressions instead of standard or extended

				# POSIX regular expressions.

				#

				# USE_LIBPCRE is a synonym for USE_LIBPCRE2, define USE_LIBPCRE1

				# instead if you'd like to use the legacy version 1 of the PCRE

				# library. Support for version 1 will likely be removed in some future

				# release of Git, as upstream has all but abandoned it.

				#

				# When using USE_LIBPCRE1, define NO_LIBPCRE1_JIT if you want to

				# disable JIT even if supported by your library.

				# Only libpcre version 2 is supported. USE_LIBPCRE2 is a synonym for

				# USE_LIBPCRE, support for the old USE_LIBPCRE1 has been removed.

				#

				# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are

				# in /foo/bar/include and /foo/bar/lib directories. Which version of

				# PCRE this points to determined by the USE_LIBPCRE1 and USE_LIBPCRE2

				# variables.

				# in /foo/bar/include and /foo/bar/lib directories.

				#

				# Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.

				#

				@ -722,6 +718,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o

				TEST_BUILTINS_OBJS += test-parse-options.o

				TEST_BUILTINS_OBJS += test-parse-pathspec-file.o

				TEST_BUILTINS_OBJS += test-path-utils.o

				TEST_BUILTINS_OBJS += test-pcre2-config.o

				TEST_BUILTINS_OBJS += test-pkt-line.o

				TEST_BUILTINS_OBJS += test-prio-queue.o

				TEST_BUILTINS_OBJS += test-proc-receive.o

				@ -840,6 +837,7 @@ LIB_OBJS += bundle.o

				LIB_OBJS += cache-tree.o

				LIB_OBJS += chdir-notify.o

				LIB_OBJS += checkout.o

				LIB_OBJS += chunk-format.o

				LIB_OBJS += color.o

				LIB_OBJS += column.o

				LIB_OBJS += combine-diff.o

				@ -860,6 +858,7 @@ LIB_OBJS += date.o

				LIB_OBJS += decorate.o

				LIB_OBJS += delta-islands.o

				LIB_OBJS += diff-delta.o

				LIB_OBJS += diff-merges.o

				LIB_OBJS += diff-lib.o

				LIB_OBJS += diff-no-index.o

				LIB_OBJS += diff.o

				@ -868,6 +867,7 @@ LIB_OBJS += diffcore-delta.o

				LIB_OBJS += diffcore-order.o

				LIB_OBJS += diffcore-pickaxe.o

				LIB_OBJS += diffcore-rename.o

				LIB_OBJS += diffcore-rotate.o

				LIB_OBJS += dir-iterator.o

				LIB_OBJS += dir.o

				LIB_OBJS += editor.o

				@ -887,6 +887,7 @@ LIB_OBJS += gettext.o

				LIB_OBJS += gpg-interface.o

				LIB_OBJS += graph.o

				LIB_OBJS += grep.o

				LIB_OBJS += hash-lookup.o

				LIB_OBJS += hashmap.o

				LIB_OBJS += help.o

				LIB_OBJS += hex.o

				@ -923,6 +924,8 @@ LIB_OBJS += notes-cache.o

				LIB_OBJS += notes-merge.o

				LIB_OBJS += notes-utils.o

				LIB_OBJS += notes.o

				LIB_OBJS += object-file.o

				LIB_OBJS += object-name.o

				LIB_OBJS += object.o

				LIB_OBJS += oid-array.o

				LIB_OBJS += oidmap.o

				@ -979,9 +982,6 @@ LIB_OBJS += sequencer.o

				LIB_OBJS += serve.o

				LIB_OBJS += server-info.o

				LIB_OBJS += setup.o

				LIB_OBJS += sha1-file.o

				LIB_OBJS += sha1-lookup.o

				LIB_OBJS += sha1-name.o

				LIB_OBJS += shallow.o

				LIB_OBJS += sideband.o

				LIB_OBJS += sigchain.o

				@ -1359,26 +1359,17 @@ ifdef NO_LIBGEN_H

					COMPAT_OBJS += compat/basename.o

				endif

				ifdef USE_LIBPCRE1

				$(error The USE_LIBPCRE1 build option has been removed, use version 2 with USE_LIBPCRE)

				endif

				USE_LIBPCRE2 ?= $(USE_LIBPCRE)

				ifneq (,$(USE_LIBPCRE2))

					ifdef USE_LIBPCRE1

				$(error Only set USE_LIBPCRE2 (or its alias USE_LIBPCRE) or USE_LIBPCRE1, not both!)

					endif

					BASIC_CFLAGS += -DUSE_LIBPCRE2

					EXTLIBS += -lpcre2-8

				endif

				ifdef USE_LIBPCRE1

					BASIC_CFLAGS += -DUSE_LIBPCRE1

					EXTLIBS += -lpcre

				ifdef NO_LIBPCRE1_JIT

					BASIC_CFLAGS += -DNO_LIBPCRE1_JIT

				endif

				endif

				ifdef LIBPCREDIR

					BASIC_CFLAGS += -I$(LIBPCREDIR)/include

					EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)

				@ -1551,6 +1542,10 @@ ifdef FREAD_READS_DIRECTORIES

					COMPAT_CFLAGS += -DFREAD_READS_DIRECTORIES

					COMPAT_OBJS += compat/fopen.o

				endif

				ifdef OPEN_RETURNS_EINTR

					COMPAT_CFLAGS += -DOPEN_RETURNS_EINTR

					COMPAT_OBJS += compat/open.o

				endif

				ifdef NO_SYMLINK_HEAD

					BASIC_CFLAGS += -DNO_SYMLINK_HEAD

				endif

				@ -2726,9 +2721,7 @@ GIT-BUILD-OPTIONS: FORCE

					@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+

					@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+

					@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+

					@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE1)))'\' >>$@+

					@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+

					@echo NO_LIBPCRE1_JIT=\''$(subst ','\'',$(subst ','\'',$(NO_LIBPCRE1_JIT)))'\' >>$@+

					@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+

					@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+

					@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+

				@ -3306,11 +3299,11 @@ cover_db_html: cover_db

				# are not necessarily appropriate for general builds, and that vary greatly

				# depending on the compiler version used.

				#

				# An example command to build against libFuzzer from LLVM 4.0.0:

				# An example command to build against libFuzzer from LLVM 11.0.0:

				#

				# make CC=clang CXX=clang++ \

				#      CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \

				#      LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \

				#      CFLAGS="-fsanitize=fuzzer-no-link,address" \

				#      LIB_FUZZING_ENGINE="-fsanitize=fuzzer" \

				#      fuzz-all

				#

				FUZZ_CXXFLAGS ?= $(CFLAGS)

2

RelNotes

View File

 @ -1 +1 @@
 Documentation/RelNotes/2.30.6.txt
 Documentation/RelNotes/2.31.7.txt

									
										64

abspath.c
									
												View File
												
				@ -67,19 +67,15 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)

				#endif

				/*

				 * Return the real path (i.e., absolute path, with symlinks resolved

				 * and extra slashes removed) equivalent to the specified path.  (If

				 * you want an absolute path but don't mind links, use

				 * absolute_path().)  Places the resolved realpath in the provided strbuf.

				 *

				 * The directory part of path (i.e., everything up to the last

				 * dir_sep) must denote a valid, existing directory, but the last

				 * component need not exist.  If die_on_error is set, then die with an

				 * informative error message if there is a problem.  Otherwise, return

				 * NULL on errors (without generating any output).

				 * If set, any number of trailing components may be missing; otherwise, only one

				 * may be.

				 */

				char *strbuf_realpath(struct strbuf *resolved, const char *path,

						      int die_on_error)

				#define REALPATH_MANY_MISSING (1 << 0)

				/* Should we die if there's an error? */

				#define REALPATH_DIE_ON_ERROR (1 << 1)

				static char *strbuf_realpath_1(struct strbuf *resolved, const char *path,

							       int flags)

				{

					struct strbuf remaining = STRBUF_INIT;

					struct strbuf next = STRBUF_INIT;

				@ -89,7 +85,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,

					struct stat st;

					if (!*path) {

						if (die_on_error)

						if (flags & REALPATH_DIE_ON_ERROR)

							die("The empty string is not a valid path");

						else

							goto error_out;

				@ -101,7 +97,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,

					if (!resolved->len) {

						/* relative path; can use CWD as the initial resolved path */

						if (strbuf_getcwd(resolved)) {

							if (die_on_error)

							if (flags & REALPATH_DIE_ON_ERROR)

								die_errno("unable to get current working directory");

							else

								goto error_out;

				@ -129,8 +125,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,

						if (lstat(resolved->buf, &st)) {

							/* error out unless this was the last component */

							if (errno != ENOENT || remaining.len) {

								if (die_on_error)

							if (errno != ENOENT ||

							   (!(flags & REALPATH_MANY_MISSING) && remaining.len)) {

								if (flags & REALPATH_DIE_ON_ERROR)

									die_errno("Invalid path '%s'",

										  resolved->buf);

								else

				@ -143,7 +140,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,

							if (num_symlinks++ > MAXSYMLINKS) {

								errno = ELOOP;

								if (die_on_error)

								if (flags & REALPATH_DIE_ON_ERROR)

									die("More than %d nested symlinks "

									    "on path '%s'", MAXSYMLINKS, path);

								else

				@ -153,7 +150,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,

							len = strbuf_readlink(&symlink, resolved->buf,

									      st.st_size);

							if (len < 0) {

								if (die_on_error)

								if (flags & REALPATH_DIE_ON_ERROR)

									die_errno("Invalid symlink '%s'",

										  resolved->buf);

								else

				@ -202,6 +199,37 @@ error_out:

					return retval;

				}

				/*

				 * Return the real path (i.e., absolute path, with symlinks resolved

				 * and extra slashes removed) equivalent to the specified path.  (If

				 * you want an absolute path but don't mind links, use

				 * absolute_path().)  Places the resolved realpath in the provided strbuf.

				 *

				 * The directory part of path (i.e., everything up to the last

				 * dir_sep) must denote a valid, existing directory, but the last

				 * component need not exist.  If die_on_error is set, then die with an

				 * informative error message if there is a problem.  Otherwise, return

				 * NULL on errors (without generating any output).

				 */

				char *strbuf_realpath(struct strbuf *resolved, const char *path,

						      int die_on_error)

				{

					return strbuf_realpath_1(resolved, path,

								 die_on_error ? REALPATH_DIE_ON_ERROR : 0);

				}

				/*

				 * Just like strbuf_realpath, but allows an arbitrary number of path

				 * components to be missing.

				 */

				char *strbuf_realpath_forgiving(struct strbuf *resolved, const char *path,

								int die_on_error)

				{

					return strbuf_realpath_1(resolved, path,

								 ((die_on_error ? REALPATH_DIE_ON_ERROR : 0) |

								  REALPATH_MANY_MISSING));

				}

				char *real_pathdup(const char *path, int die_on_error)

				{

					struct strbuf realpath = STRBUF_INIT;

									
										6

add-interactive.c
									
												View File
												
				@ -413,7 +413,7 @@ struct file_item {

				static void add_file_item(struct string_list *files, const char *name)

				{

					struct file_item *item = xcalloc(sizeof(*item), 1);

					struct file_item *item = xcalloc(1, sizeof(*item));

					string_list_append(files, name)->util = item;

				}

				@ -476,7 +476,7 @@ static void collect_changes_cb(struct diff_queue_struct *q,

							add_file_item(s->files, name);

							entry = xcalloc(sizeof(*entry), 1);

							CALLOC_ARRAY(entry, 1);

							hashmap_entry_init(&entry->ent, hash);

							entry->name = s->files->items[s->files->nr - 1].string;

							entry->item = s->files->items[s->files->nr - 1].util;

				@ -1120,7 +1120,7 @@ int run_add_i(struct repository *r, const struct pathspec *ps)

					int res = 0;

					for (i = 0; i < ARRAY_SIZE(command_list); i++) {

						struct command_item *util = xcalloc(sizeof(*util), 1);

						struct command_item *util = xcalloc(1, sizeof(*util));

						util->command = command_list[i].command;

						string_list_append(&commands.items, command_list[i].string)

							->util = util;

									
										33

apply.c
									
												View File
												
				@ -1781,7 +1781,7 @@ static int parse_single_patch(struct apply_state *state,

						struct fragment *fragment;

						int len;

						fragment = xcalloc(1, sizeof(*fragment));

						CALLOC_ARRAY(fragment, 1);

						fragment->linenr = state->linenr;

						len = parse_fragment(state, line, size, patch, fragment);

						if (len <= 0) {

				@ -1959,7 +1959,7 @@ static struct fragment *parse_binary_hunk(struct apply_state *state,

						size -= llen;

					}

					frag = xcalloc(1, sizeof(*frag));

					CALLOC_ARRAY(frag, 1);

					frag->patch = inflate_it(data, hunk_size, origlen);

					frag->free_patch = 1;

					if (!frag->patch)

				@ -4400,6 +4400,33 @@ static int create_one_file(struct apply_state *state,

					if (state->cached)

						return 0;

					/*

					 * We already try to detect whether files are beyond a symlink in our

					 * up-front checks. But in the case where symlinks are created by any

					 * of the intermediate hunks it can happen that our up-front checks

					 * didn't yet see the symlink, but at the point of arriving here there

					 * in fact is one. We thus repeat the check for symlinks here.

					 *

					 * Note that this does not make the up-front check obsolete as the

					 * failure mode is different:

					 *

					 * - The up-front checks cause us to abort before we have written

					 *   anything into the working directory. So when we exit this way the

					 *   working directory remains clean.

					 *

					 * - The checks here happen in the middle of the action where we have

					 *   already started to apply the patch. The end result will be a dirty

					 *   working directory.

					 *

					 * Ideally, we should update the up-front checks to catch what would

					 * happen when we apply the patch before we damage the working tree.

					 * We have all the information necessary to do so.  But for now, as a

					 * part of embargoed security work, having this check would serve as a

					 * reasonable first step.

					 */

					if (path_is_beyond_symlink(state, path))

						return error(_("affected file '%s' is beyond a symbolic link"), path);

					res = try_create_file(state, path, mode, buf, size);

					if (res < 0)

						return -1;

				@ -4681,7 +4708,7 @@ static int apply_patch(struct apply_state *state,

						struct patch *patch;

						int nr;

						patch = xcalloc(1, sizeof(*patch));

						CALLOC_ARRAY(patch, 1);

						patch->inaccurate_eof = !!(options & APPLY_OPT_INACCURATE_EOF);

						patch->recount =  !!(options & APPLY_OPT_RECOUNT);

						nr = parse_chunk(state, buf.buf + offset, buf.len - offset, patch);

									
										2

archive-tar.c
									
												View File
												
				@ -371,7 +371,7 @@ static int tar_filter_config(const char *var, const char *value, void *data)

					ar = find_tar_filter(name, namelen);

					if (!ar) {

						ar = xcalloc(1, sizeof(*ar));

						CALLOC_ARRAY(ar, 1);

						ar->name = xmemdupz(name, namelen);

						ar->write_archive = write_tar_filter_archive;

						ar->flags = ARCHIVER_WANT_COMPRESSION_LEVELS |

									
										107

attr.c
									
												View File
												
				@ -28,7 +28,7 @@ static const char git_attr__unknown[] = "(builtin)unknown";

				#endif

				struct git_attr {

					int attr_nr; /* unique attribute number */

					unsigned int attr_nr; /* unique attribute number */

					char name[FLEX_ARRAY]; /* attribute name */

				};

				@ -210,7 +210,7 @@ static void report_invalid_attr(const char *name, size_t len,

				 * dictionary.  If no entry is found, create a new attribute and store it in

				 * the dictionary.

				 */

				static const struct git_attr *git_attr_internal(const char *name, int namelen)

				static const struct git_attr *git_attr_internal(const char *name, size_t namelen)

				{

					struct git_attr *a;

				@ -226,8 +226,8 @@ static const struct git_attr *git_attr_internal(const char *name, int namelen)

						a->attr_nr = hashmap_get_size(&g_attr_hashmap.map);

						attr_hashmap_add(&g_attr_hashmap, a->name, namelen, a);

						assert(a->attr_nr ==

						       (hashmap_get_size(&g_attr_hashmap.map) - 1));

						if (a->attr_nr != hashmap_get_size(&g_attr_hashmap.map) - 1)

							die(_("unable to add additional attribute"));

					}

					hashmap_unlock(&g_attr_hashmap);

				@ -272,7 +272,7 @@ struct match_attr {

						const struct git_attr *attr;

					} u;

					char is_macro;

					unsigned num_attr;

					size_t num_attr;

					struct attr_state state[FLEX_ARRAY];

				};

				@ -289,7 +289,7 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,

							      struct attr_state *e)

				{

					const char *ep, *equals;

					int len;

					size_t len;

					ep = cp + strcspn(cp, blank);

					equals = strchr(cp, '=');

				@ -333,8 +333,7 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,

				static struct match_attr *parse_attr_line(const char *line, const char *src,

									  int lineno, int macro_ok)

				{

					int namelen;

					int num_attr, i;

					size_t namelen, num_attr, i;

					const char *cp, *name, *states;

					struct match_attr *res = NULL;

					int is_macro;

				@ -345,6 +344,11 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,

						return NULL;

					name = cp;

					if (strlen(line) >= ATTR_MAX_LINE_LENGTH) {

						warning(_("ignoring overly long attributes line %d"), lineno);

						return NULL;

					}

					if (*cp == '"' && !unquote_c_style(&pattern, name, &states)) {

						name = pattern.buf;

						namelen = pattern.len;

				@ -381,10 +385,9 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,

							goto fail_return;

					}

					res = xcalloc(1,

						      sizeof(*res) +

						      sizeof(struct attr_state) * num_attr +

						      (is_macro ? 0 : namelen + 1));

					res = xcalloc(1, st_add3(sizeof(*res),

								 st_mult(sizeof(struct attr_state), num_attr),

								 is_macro ? 0 : namelen + 1));

					if (is_macro) {

						res->u.attr = git_attr_internal(name, namelen);

					} else {

				@ -447,11 +450,12 @@ struct attr_stack {

				static void attr_stack_free(struct attr_stack *e)

				{

					int i;

					unsigned i;

					free(e->origin);

					for (i = 0; i < e->num_matches; i++) {

						struct match_attr *a = e->attrs[i];

						int j;

						size_t j;

						for (j = 0; j < a->num_attr; j++) {

							const char *setto = a->state[j].setto;

							if (setto == ATTR__TRUE ||

				@ -569,7 +573,7 @@ struct attr_check *attr_check_initl(const char *one, ...)

					check = attr_check_alloc();

					check->nr = cnt;

					check->alloc = cnt;

					check->items = xcalloc(cnt, sizeof(struct attr_check_item));

					CALLOC_ARRAY(check->items, cnt);

					check->items[0].attr = git_attr(one);

					va_start(params, one);

				@ -660,8 +664,8 @@ static void handle_attr_line(struct attr_stack *res,

					a = parse_attr_line(line, src, lineno, macro_ok);

					if (!a)

						return;

					ALLOC_GROW(res->attrs, res->num_matches + 1, res->alloc);

					res->attrs[res->num_matches++] = a;

					ALLOC_GROW_BY(res->attrs, res->num_matches, 1, res->alloc);

					res->attrs[res->num_matches - 1] = a;

				}

				static struct attr_stack *read_attr_from_array(const char **list)

				@ -670,7 +674,7 @@ static struct attr_stack *read_attr_from_array(const char **list)

					const char *line;

					int lineno = 0;

					res = xcalloc(1, sizeof(*res));

					CALLOC_ARRAY(res, 1);

					while ((line = *(list++)) != NULL)

						handle_attr_line(res, line, "[builtin]", ++lineno, 1);

					return res;

				@ -700,21 +704,37 @@ void git_attr_set_direction(enum git_attr_direction new_direction)

				static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)

				{

					struct strbuf buf = STRBUF_INIT;

					FILE *fp = fopen_or_warn(path, "r");

					struct attr_stack *res;

					char buf[2048];

					int lineno = 0;

					int fd;

					struct stat st;

					if (!fp)

						return NULL;

					res = xcalloc(1, sizeof(*res));

					while (fgets(buf, sizeof(buf), fp)) {

						char *bufp = buf;

						if (!lineno)

							skip_utf8_bom(&bufp, strlen(bufp));

						handle_attr_line(res, bufp, path, ++lineno, macro_ok);

					fd = fileno(fp);

					if (fstat(fd, &st)) {

						warning_errno(_("cannot fstat gitattributes file '%s'"), path);

						fclose(fp);

						return NULL;

					}

					if (st.st_size >= ATTR_MAX_FILE_SIZE) {

						warning(_("ignoring overly large gitattributes file '%s'"), path);

						fclose(fp);

						return NULL;

					}

					CALLOC_ARRAY(res, 1);

					while (strbuf_getline(&buf, fp) != EOF) {

						if (!lineno && starts_with(buf.buf, utf8_bom))

							strbuf_remove(&buf, 0, strlen(utf8_bom));

						handle_attr_line(res, buf.buf, path, ++lineno, macro_ok);

					}

					fclose(fp);

					strbuf_release(&buf);

					return res;

				}

				@ -725,15 +745,20 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,

					struct attr_stack *res;

					char *buf, *sp;

					int lineno = 0;

					unsigned long size;

					if (!istate)

						return NULL;

					buf = read_blob_data_from_index(istate, path, NULL);

					buf = read_blob_data_from_index(istate, path, &size);

					if (!buf)

						return NULL;

					if (size >= ATTR_MAX_FILE_SIZE) {

						warning(_("ignoring overly large gitattributes blob '%s'"), path);

						return NULL;

					}

					res = xcalloc(1, sizeof(*res));

					CALLOC_ARRAY(res, 1);

					for (sp = buf; *sp; ) {

						char *ep;

						int more;

				@ -774,7 +799,7 @@ static struct attr_stack *read_attr(const struct index_state *istate,

					}

					if (!res)

						res = xcalloc(1, sizeof(*res));

						CALLOC_ARRAY(res, 1);

					return res;

				}

				@ -874,7 +899,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,

					else

						e = NULL;

					if (!e)

						e = xcalloc(1, sizeof(struct attr_stack));

						CALLOC_ARRAY(e, 1);

					push_stack(stack, e, NULL, 0);

				}

				@ -1001,12 +1026,12 @@ static int macroexpand_one(struct all_attrs_item *all_attrs, int nr, int rem);

				static int fill_one(const char *what, struct all_attrs_item *all_attrs,

						    const struct match_attr *a, int rem)

				{

					int i;

					size_t i;

					for (i = a->num_attr - 1; rem > 0 && i >= 0; i--) {

						const struct git_attr *attr = a->state[i].attr;

					for (i = a->num_attr; rem > 0 && i > 0; i--) {

						const struct git_attr *attr = a->state[i - 1].attr;

						const char **n = &(all_attrs[attr->attr_nr].value);

						const char *v = a->state[i].setto;

						const char *v = a->state[i - 1].setto;

						if (*n == ATTR__UNKNOWN) {

							debug_set(what,

				@ -1025,11 +1050,11 @@ static int fill(const char *path, int pathlen, int basename_offset,

						struct all_attrs_item *all_attrs, int rem)

				{

					for (; rem > 0 && stack; stack = stack->prev) {

						int i;

						unsigned i;

						const char *base = stack->origin ? stack->origin : "";

						for (i = stack->num_matches - 1; 0 < rem && 0 <= i; i--) {

							const struct match_attr *a = stack->attrs[i];

						for (i = stack->num_matches; 0 < rem && 0 < i; i--) {

							const struct match_attr *a = stack->attrs[i - 1];

							if (a->is_macro)

								continue;

							if (path_matches(path, pathlen, basename_offset,

				@ -1060,11 +1085,11 @@ static void determine_macros(struct all_attrs_item *all_attrs,

							     const struct attr_stack *stack)

				{

					for (; stack; stack = stack->prev) {

						int i;

						for (i = stack->num_matches - 1; i >= 0; i--) {

							const struct match_attr *ma = stack->attrs[i];

						unsigned i;

						for (i = stack->num_matches; i > 0; i--) {

							const struct match_attr *ma = stack->attrs[i - 1];

							if (ma->is_macro) {

								int n = ma->u.attr->attr_nr;

								unsigned int n = ma->u.attr->attr_nr;

								if (!all_attrs[n].macro) {

									all_attrs[n].macro = ma;

								}

				@ -1116,7 +1141,7 @@ void git_check_attr(const struct index_state *istate,

					collect_some_attrs(istate, path, check);

					for (i = 0; i < check->nr; i++) {

						size_t n = check->items[i].attr->attr_nr;

						unsigned int n = check->items[i].attr->attr_nr;

						const char *value = check->all_attrs[n].value;

						if (value == ATTR__UNKNOWN)

							value = ATTR__UNSET;

									
										12

attr.h
									
												View File
												
				@ -107,6 +107,18 @@

				 * - Free the `attr_check` struct by calling `attr_check_free()`.

				 */

				/**

				 * The maximum line length for a gitattributes file. If the line exceeds this

				 * length we will ignore it.

				 */

				#define ATTR_MAX_LINE_LENGTH 2048

				 /**

				  * The maximum size of the giattributes file. If the file exceeds this size we

				  * will ignore it.

				  */

				#define ATTR_MAX_FILE_SIZE (100 * 1024 * 1024)

				struct index_state;

				/**

									
										6

bisect.c
									
												View File
												
				@ -6,7 +6,7 @@

				#include "refs.h"

				#include "list-objects.h"

				#include "quote.h"

				#include "sha1-lookup.h"

				#include "hash-lookup.h"

				#include "run-command.h"

				#include "log-tree.h"

				#include "bisect.h"

				@ -423,7 +423,7 @@ void find_bisection(struct commit_list **commit_list, int *reaches,

					show_list("bisection 2 sorted", 0, nr, list);

					*all = nr;

					weights = xcalloc(on_list, sizeof(*weights));

					CALLOC_ARRAY(weights, on_list);

					/* Do the real work of finding bisection commit. */

					best = do_find_bisection(list, nr, weights, bisect_flags);

				@ -1064,7 +1064,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)

					if (!all) {

						fprintf(stderr, _("No testable commit found.\n"

							"Maybe you started with bad path parameters?\n"));

							"Maybe you started with bad path arguments?\n"));

						return BISECT_NO_TESTABLE_COMMIT;

					}

									
										17

blame.c
									
												View File
												
				@ -951,13 +951,13 @@ static int *fuzzy_find_matching_lines(struct blame_origin *parent,

					max_search_distance_b = ((2 * max_search_distance_a + 1) * length_b

								 - 1) / length_a;

					result = xcalloc(sizeof(int), length_b);

					second_best_result = xcalloc(sizeof(int), length_b);

					certainties = xcalloc(sizeof(int), length_b);

					CALLOC_ARRAY(result, length_b);

					CALLOC_ARRAY(second_best_result, length_b);

					CALLOC_ARRAY(certainties, length_b);

					/* See get_similarity() for details of similarities. */

					similarity_count = length_b * (max_search_distance_a * 2 + 1);

					similarities = xcalloc(sizeof(int), similarity_count);

					CALLOC_ARRAY(similarities, similarity_count);

					for (i = 0; i < length_b; ++i) {

						result[i] = -1;

				@ -995,7 +995,7 @@ static void fill_origin_fingerprints(struct blame_origin *o)

						return;

					o->num_lines = find_line_starts(&line_starts, o->file.ptr,

									o->file.size);

					o->fingerprints = xcalloc(sizeof(struct fingerprint), o->num_lines);

					CALLOC_ARRAY(o->fingerprints, o->num_lines);

					get_line_fingerprints(o->fingerprints, o->file.ptr, line_starts,

							      0, o->num_lines);

					free(line_starts);

				@ -1853,8 +1853,7 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,

					diffp = NULL;

					if (ignore_diffs && same - tlno > 0) {

						line_blames = xcalloc(sizeof(struct blame_line_tracker),

								      same - tlno);

						CALLOC_ARRAY(line_blames, same - tlno);

						guess_line_blames(parent, target, tlno, offset, same,

								  parent_len, line_blames);

					}

				@ -2216,7 +2215,7 @@ static struct blame_list *setup_blame_list(struct blame_entry *unblamed,

					for (e = unblamed, num_ents = 0; e; e = e->next)

						num_ents++;

					if (num_ents) {

						blame_list = xcalloc(num_ents, sizeof(struct blame_list));

						CALLOC_ARRAY(blame_list, num_ents);

						for (e = unblamed, i = 0; e; e = e->next)

							blame_list[i++].ent = e;

					}

				@ -2428,7 +2427,7 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,

					else if (num_sg < ARRAY_SIZE(sg_buf))

						memset(sg_buf, 0, sizeof(sg_buf));

					else

						sg_origin = xcalloc(num_sg, sizeof(*sg_origin));

						CALLOC_ARRAY(sg_origin, num_sg);

					/*

					 * The first pass looks for unrenamed path to optimize for

									
										2

block-sha1/sha1.c
									
												View File
												
				@ -70,7 +70,7 @@

				 * the input data, the next mix it from the 512-bit array.

				 */

				#define SHA_SRC(t) get_be32((unsigned char *) block + (t)*4)

				#define SHA_MIX(t) SHA_ROL(W((t)+13) ^ W((t)+8) ^ W((t)+2) ^ W(t), 1);

				#define SHA_MIX(t) SHA_ROL(W((t)+13) ^ W((t)+8) ^ W((t)+2) ^ W(t), 1)

				#define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \

					unsigned int TEMP = input(t); setW(t, TEMP); \

									
										2

bloom.c
									
												View File
												
				@ -277,7 +277,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,

								*computed |= BLOOM_TRUNC_EMPTY;

							filter->len = 1;

						}

						filter->data = xcalloc(filter->len, sizeof(unsigned char));

						CALLOC_ARRAY(filter->data, filter->len);

						hashmap_for_each_entry(&pathmap, &iter, e, entry) {

							struct bloom_key key;

									
										18

builtin/add.c
									
												View File
												
				@ -38,19 +38,27 @@ struct update_callback_data {

					int add_errors;

				};

				static void chmod_pathspec(struct pathspec *pathspec, char flip)

				static int chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)

				{

					int i;

					int i, ret = 0;

					for (i = 0; i < active_nr; i++) {

						struct cache_entry *ce = active_cache[i];

						int err;

						if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))

							continue;

						if (chmod_cache_entry(ce, flip) < 0)

							fprintf(stderr, "cannot chmod %cx '%s'\n", flip, ce->name);

						if (!show_only)

							err = chmod_cache_entry(ce, flip);

						else

							err = S_ISREG(ce->ce_mode) ? 0 : -1;

						if (err < 0)

							ret = error(_("cannot chmod %cx '%s'"), flip, ce->name);

					}

					return ret;

				}

				static int fix_unmerged_status(struct diff_filepair *p,

				@ -609,7 +617,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)

						exit_status |= add_files(&dir, flags);

					if (chmod_arg && pathspec.nr)

						chmod_pathspec(&pathspec, chmod_arg[0]);

						exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);

					unplug_bulk_checkin();

				finish:

									
										192

builtin/bisect--helper.c
									
												View File
												
				@ -21,16 +21,15 @@ static GIT_PATH_FUNC(git_path_bisect_first_parent, "BISECT_FIRST_PARENT")

				static const char * const git_bisect_helper_usage[] = {

					N_("git bisect--helper --bisect-reset [<commit>]"),

					N_("git bisect--helper --bisect-write [--no-log] <state> <revision> <good_term> <bad_term>"),

					N_("git bisect--helper --bisect-check-and-set-terms <command> <good_term> <bad_term>"),

					N_("git bisect--helper --bisect-next-check <good_term> <bad_term> [<term>]"),

					N_("git bisect--helper --bisect-terms [--term-good | --term-old | --term-bad | --term-new]"),

					N_("git bisect--helper --bisect-start [--term-{new,bad}=<term> --term-{old,good}=<term>]"

									    " [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<paths>...]"),

					N_("git bisect--helper --bisect-next"),

					N_("git bisect--helper --bisect-auto-next"),

					N_("git bisect--helper --bisect-state (bad|new) [<rev>]"),

					N_("git bisect--helper --bisect-state (good|old) [<rev>...]"),

					N_("git bisect--helper --bisect-replay <filename>"),

					N_("git bisect--helper --bisect-skip [(<rev>|<range>)...]"),

					NULL

				};

				@ -875,12 +874,19 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a

					 */

					for (; argc; argc--, argv++) {

						struct commit *commit;

						if (get_oid(*argv, &oid)){

							error(_("Bad rev input: %s"), *argv);

							oid_array_clear(&revs);

							return BISECT_FAILED;

						}

						oid_array_append(&revs, &oid);

						commit = lookup_commit_reference(the_repository, &oid);

						if (!commit)

							die(_("Bad rev input (not a commit): %s"), *argv);

						oid_array_append(&revs, &commit->object.oid);

					}

					if (strbuf_read_file(&buf, git_path_bisect_expected_rev(), 0) < the_hash_algo->hexsz ||

				@ -904,28 +910,148 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, const char **a

					return bisect_auto_next(terms, NULL);

				}

				static enum bisect_error bisect_log(void)

				{

					int fd, status;

					const char* filename = git_path_bisect_log();

					if (is_empty_or_missing_file(filename))

						return error(_("We are not bisecting."));

					fd = open(filename, O_RDONLY);

					if (fd < 0)

						return BISECT_FAILED;

					status = copy_fd(fd, STDOUT_FILENO);

					close(fd);

					return status ? BISECT_FAILED : BISECT_OK;

				}

				static int process_replay_line(struct bisect_terms *terms, struct strbuf *line)

				{

					const char *p = line->buf + strspn(line->buf, " \t");

					char *word_end, *rev;

					if ((!skip_prefix(p, "git bisect", &p) &&

					!skip_prefix(p, "git-bisect", &p)) || !isspace(*p))

						return 0;

					p += strspn(p, " \t");

					word_end = (char *)p + strcspn(p, " \t");

					rev = word_end + strspn(word_end, " \t");

					*word_end = '\0'; /* NUL-terminate the word */

					get_terms(terms);

					if (check_and_set_terms(terms, p))

						return -1;

					if (!strcmp(p, "start")) {

						struct strvec argv = STRVEC_INIT;

						int res;

						sq_dequote_to_strvec(rev, &argv);

						res = bisect_start(terms, argv.v, argv.nr);

						strvec_clear(&argv);

						return res;

					}

					if (one_of(p, terms->term_good,

					   terms->term_bad, "skip", NULL))

						return bisect_write(p, rev, terms, 0);

					if (!strcmp(p, "terms")) {

						struct strvec argv = STRVEC_INIT;

						int res;

						sq_dequote_to_strvec(rev, &argv);

						res = bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL);

						strvec_clear(&argv);

						return res;

					}

					error(_("'%s'?? what are you talking about?"), p);

					return -1;

				}

				static enum bisect_error bisect_replay(struct bisect_terms *terms, const char *filename)

				{

					FILE *fp = NULL;

					enum bisect_error res = BISECT_OK;

					struct strbuf line = STRBUF_INIT;

					if (is_empty_or_missing_file(filename))

						return error(_("cannot read file '%s' for replaying"), filename);

					if (bisect_reset(NULL))

						return BISECT_FAILED;

					fp = fopen(filename, "r");

					if (!fp)

						return BISECT_FAILED;

					while ((strbuf_getline(&line, fp) != EOF) && !res)

						res = process_replay_line(terms, &line);

					strbuf_release(&line);

					fclose(fp);

					if (res)

						return BISECT_FAILED;

					return bisect_auto_next(terms, NULL);

				}

				static enum bisect_error bisect_skip(struct bisect_terms *terms, const char **argv, int argc)

				{

					int i;

					enum bisect_error res;

					struct strvec argv_state = STRVEC_INIT;

					strvec_push(&argv_state, "skip");

					for (i = 0; i < argc; i++) {

						const char *dotdot = strstr(argv[i], "..");

						if (dotdot) {

							struct rev_info revs;

							struct commit *commit;

							init_revisions(&revs, NULL);

							setup_revisions(2, argv + i - 1, &revs, NULL);

							if (prepare_revision_walk(&revs))

								die(_("revision walk setup failed\n"));

							while ((commit = get_revision(&revs)) != NULL)

								strvec_push(&argv_state,

										oid_to_hex(&commit->object.oid));

							reset_revision_walk();

						} else {

							strvec_push(&argv_state, argv[i]);

						}

					}

					res = bisect_state(terms, argv_state.v, argv_state.nr);

					strvec_clear(&argv_state);

					return res;

				}

				int cmd_bisect__helper(int argc, const char **argv, const char *prefix)

				{

					enum {

						BISECT_RESET = 1,

						BISECT_WRITE,

						CHECK_AND_SET_TERMS,

						BISECT_NEXT_CHECK,

						BISECT_TERMS,

						BISECT_START,

						BISECT_AUTOSTART,

						BISECT_NEXT,

						BISECT_AUTO_NEXT,

						BISECT_STATE

						BISECT_STATE,

						BISECT_LOG,

						BISECT_REPLAY,

						BISECT_SKIP

					} cmdmode = 0;

					int res = 0, nolog = 0;

					struct option options[] = {

						OPT_CMDMODE(0, "bisect-reset", &cmdmode,

							 N_("reset the bisection state"), BISECT_RESET),

						OPT_CMDMODE(0, "bisect-write", &cmdmode,

							 N_("write out the bisection state in BISECT_LOG"), BISECT_WRITE),

						OPT_CMDMODE(0, "check-and-set-terms", &cmdmode,

							 N_("check and set terms in a bisection state"), CHECK_AND_SET_TERMS),

						OPT_CMDMODE(0, "bisect-next-check", &cmdmode,

							 N_("check whether bad or good terms exist"), BISECT_NEXT_CHECK),

						OPT_CMDMODE(0, "bisect-terms", &cmdmode,

				@ -934,10 +1060,14 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)

							 N_("start the bisect session"), BISECT_START),

						OPT_CMDMODE(0, "bisect-next", &cmdmode,

							 N_("find the next bisection commit"), BISECT_NEXT),

						OPT_CMDMODE(0, "bisect-auto-next", &cmdmode,

							 N_("verify the next bisection state then checkout the next bisection commit"), BISECT_AUTO_NEXT),

						OPT_CMDMODE(0, "bisect-state", &cmdmode,

							 N_("mark the state of ref (or refs)"), BISECT_STATE),

						OPT_CMDMODE(0, "bisect-log", &cmdmode,

							 N_("list the bisection steps so far"), BISECT_LOG),

						OPT_CMDMODE(0, "bisect-replay", &cmdmode,

							 N_("replay the bisection process from the given file"), BISECT_REPLAY),

						OPT_CMDMODE(0, "bisect-skip", &cmdmode,

							 N_("skip some commits for checkout"), BISECT_SKIP),

						OPT_BOOL(0, "no-log", &nolog,

							 N_("no log for BISECT_WRITE")),

						OPT_END()

				@ -955,18 +1085,7 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)

					case BISECT_RESET:

						if (argc > 1)

							return error(_("--bisect-reset requires either no argument or a commit"));

						return !!bisect_reset(argc ? argv[0] : NULL);

					case BISECT_WRITE:

						if (argc != 4 && argc != 5)

							return error(_("--bisect-write requires either 4 or 5 arguments"));

						set_terms(&terms, argv[3], argv[2]);

						res = bisect_write(argv[0], argv[1], &terms, nolog);

						break;

					case CHECK_AND_SET_TERMS:

						if (argc != 3)

							return error(_("--check-and-set-terms requires 3 arguments"));

						set_terms(&terms, argv[2], argv[1]);

						res = check_and_set_terms(&terms, argv[0]);

						res = bisect_reset(argc ? argv[0] : NULL);

						break;

					case BISECT_NEXT_CHECK:

						if (argc != 2 && argc != 3)

				@ -989,17 +1108,26 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)

						get_terms(&terms);

						res = bisect_next(&terms, prefix);

						break;

					case BISECT_AUTO_NEXT:

						if (argc)

							return error(_("--bisect-auto-next requires 0 arguments"));

						get_terms(&terms);

						res = bisect_auto_next(&terms, prefix);

						break;

					case BISECT_STATE:

						set_terms(&terms, "bad", "good");

						get_terms(&terms);

						res = bisect_state(&terms, argv, argc);

						break;

					case BISECT_LOG:

						if (argc)

							return error(_("--bisect-log requires 0 arguments"));

						res = bisect_log();

						break;

					case BISECT_REPLAY:

						if (argc != 1)

							return error(_("no logfile given"));

						set_terms(&terms, "bad", "good");

						res = bisect_replay(&terms, argv[0]);

						break;

					case BISECT_SKIP:

						set_terms(&terms, "bad", "good");

						res = bisect_skip(&terms, argv, argc);

						break;

					default:

						BUG("unknown subcommand %d", cmdmode);

					}

									
										10

builtin/blame.c
									
												View File
												
				@ -425,13 +425,11 @@ static void setup_default_color_by_age(void)

					parse_color_fields("blue,12 month ago,white,1 month ago,red");

				}

				static void determine_line_heat(struct blame_entry *ent, const char **dest_color)

				static void determine_line_heat(struct commit_info *ci, const char **dest_color)

				{

					int i = 0;

					struct commit_info ci;

					get_commit_info(ent->suspect->commit, &ci, 1);

					while (i < colorfield_nr && ci.author_time > colorfield[i].hop)

					while (i < colorfield_nr && ci->author_time > colorfield[i].hop)

						i++;

					*dest_color = colorfield[i].col;

				@ -453,7 +451,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int

					cp = blame_nth_line(sb, ent->lno);

					if (opt & OUTPUT_SHOW_AGE_WITH_COLOR) {

						determine_line_heat(ent, &default_color);

						determine_line_heat(&ci, &default_color);

						color = default_color;

						reset = GIT_COLOR_RESET;

					}

				@ -1151,7 +1149,7 @@ parse_done:

					sb.xdl_opts = xdl_opts;

					sb.no_whole_file_rename = no_whole_file_rename;

					read_mailmap(&mailmap, NULL);

					read_mailmap(&mailmap);

					sb.found_guilty_entry = &found_guilty_entry;

					sb.found_guilty_entry_data = &pi;

									
										47

builtin/branch.c
									
												View File
												
				@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,

					int remote_branch = 0;

					struct strbuf bname = STRBUF_INIT;

					unsigned allowed_interpret;

					struct string_list refs_to_delete = STRING_LIST_INIT_DUP;

					struct string_list_item *item;

					int branch_name_pos;

					switch (kinds) {

					case FILTER_REFS_REMOTES:

				@ -219,6 +222,7 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,

					default:

						die(_("cannot use -a with -d"));

					}

					branch_name_pos = strcspn(fmt, "%");

					if (!force) {

						head_rev = lookup_commit_reference(the_repository, &head_oid);

				@ -265,30 +269,35 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,

							goto next;

						}

						if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,

							       REF_NO_DEREF)) {

							error(remote_branch

							      ? _("Error deleting remote-tracking branch '%s'")

							      : _("Error deleting branch '%s'"),

							      bname.buf);

							ret = 1;

							goto next;

						}

						if (!quiet) {

							printf(remote_branch

							       ? _("Deleted remote-tracking branch %s (was %s).\n")

							       : _("Deleted branch %s (was %s).\n"),

							       bname.buf,

							       (flags & REF_ISBROKEN) ? "broken"

							       : (flags & REF_ISSYMREF) ? target

							       : find_unique_abbrev(&oid, DEFAULT_ABBREV));

						}

						delete_branch_config(bname.buf);

						item = string_list_append(&refs_to_delete, name);

						item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"

								    : (flags & REF_ISSYMREF) ? target

								    : find_unique_abbrev(&oid, DEFAULT_ABBREV));

					next:

						free(target);

					}

					if (delete_refs(NULL, &refs_to_delete, REF_NO_DEREF))

						ret = 1;

					for_each_string_list_item(item, &refs_to_delete) {

						char *describe_ref = item->util;

						char *name = item->string;

						if (!ref_exists(name)) {

							char *refname = name + branch_name_pos;

							if (!quiet)

								printf(remote_branch

									? _("Deleted remote-tracking branch %s (was %s).\n")

									: _("Deleted branch %s (was %s).\n"),

									name + branch_name_pos, describe_ref);

							delete_branch_config(refname);

						}

						free(describe_ref);

					}

					string_list_clear(&refs_to_delete, 0);

					free(name);

					strbuf_release(&bname);

									
										2

builtin/check-mailmap.c
									
												View File
												
				@ -47,7 +47,7 @@ int cmd_check_mailmap(int argc, const char **argv, const char *prefix)

					if (argc == 0 && !use_stdin)

						die(_("no contacts specified"));

					read_mailmap(&mailmap, NULL);

					read_mailmap(&mailmap);

					for (i = 0; i < argc; ++i)

						check_mailmap(&mailmap, argv[i]);

									
										39

builtin/checkout-index.c
									
												View File
												
				@ -23,22 +23,35 @@ static struct checkout state = CHECKOUT_INIT;

				static void write_tempfile_record(const char *name, const char *prefix)

				{

					int i;

					int have_tempname = 0;

					if (CHECKOUT_ALL == checkout_stage) {

						for (i = 1; i < 4; i++) {

							if (i > 1)

								putchar(' ');

							if (topath[i][0])

								fputs(topath[i], stdout);

							else

								putchar('.');

						}

					} else

						fputs(topath[checkout_stage], stdout);

						for (i = 1; i < 4; i++)

							if (topath[i][0]) {

								have_tempname = 1;

								break;

							}

					putchar('\t');

					write_name_quoted_relative(name, prefix, stdout,

								   nul_term_line ? '\0' : '\n');

						if (have_tempname) {

							for (i = 1; i < 4; i++) {

								if (i > 1)

									putchar(' ');

								if (topath[i][0])

									fputs(topath[i], stdout);

								else

									putchar('.');

							}

						}

					} else if (topath[checkout_stage][0]) {

						have_tempname = 1;

						fputs(topath[checkout_stage], stdout);

					}

					if (have_tempname) {

						putchar('\t');

						write_name_quoted_relative(name, prefix, stdout,

									   nul_term_line ? '\0' : '\n');

					}

					for (i = 0; i < 4; i++) {

						topath[i][0] = 0;

									
										3

builtin/checkout.c
									
												View File
												
				@ -821,9 +821,6 @@ static int merge_working_tree(const struct checkout_opts *opts,

						}

					}

					if (!active_cache_tree)

						active_cache_tree = cache_tree();

					if (!cache_tree_fully_valid(active_cache_tree))

						cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);

									
										2

builtin/clean.c
									
												View File
												
				@ -623,7 +623,7 @@ static int *list_and_choose(struct menu_opts *opts, struct menu_stuff *stuff)

								nr += chosen[i];

						}

						result = xcalloc(st_add(nr, 1), sizeof(int));

						CALLOC_ARRAY(result, st_add(nr, 1));

						for (i = 0; i < stuff->nr && j < nr; i++) {

							if (chosen[i])

								result[j++] = i;

									
										42

builtin/clone.c
									
												View File
												
				@ -981,7 +981,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)

					int err = 0, complete_refs_before_fetch = 1;

					int submodule_progress;

					struct strvec ref_prefixes = STRVEC_INIT;

					struct transport_ls_refs_options transport_ls_refs_options =

						TRANSPORT_LS_REFS_OPTIONS_INIT;

					packet_trace_identity("clone");

				@ -1201,10 +1202,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)

					refspec_appendf(&remote->fetch, "+%s*:%s*", src_ref_prefix,

							branch_top.buf);

					transport = transport_get(remote, remote->url[0]);

					transport_set_verbosity(transport, option_verbosity, option_progress);

					transport->family = family;

					path = get_repo_path(remote->url[0], &is_bundle);

					is_local = option_local != 0 && path && !is_bundle;

					if (is_local) {

				@ -1224,6 +1221,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)

					}

					if (option_local > 0 && !is_local)

						warning(_("--local is ignored"));

					transport = transport_get(remote, path ? path : remote->url[0]);

					transport_set_verbosity(transport, option_verbosity, option_progress);

					transport->family = family;

					transport->cloning = 1;

					transport_set_option(transport, TRANS_OPT_KEEP, "yes");

				@ -1259,14 +1260,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)

						transport->smart_options->check_self_contained_and_connected = 1;

					strvec_push(&ref_prefixes, "HEAD");

					refspec_ref_prefixes(&remote->fetch, &ref_prefixes);

					strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");

					refspec_ref_prefixes(&remote->fetch,

							     &transport_ls_refs_options.ref_prefixes);

					if (option_branch)

						expand_ref_prefix(&ref_prefixes, option_branch);

						expand_ref_prefix(&transport_ls_refs_options.ref_prefixes,

								  option_branch);

					if (!option_no_tags)

						strvec_push(&ref_prefixes, "refs/tags/");

						strvec_push(&transport_ls_refs_options.ref_prefixes,

							    "refs/tags/");

					refs = transport_get_remote_refs(transport, &ref_prefixes);

					refs = transport_get_remote_refs(transport, &transport_ls_refs_options);

					if (refs) {

						int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));

				@ -1328,8 +1332,19 @@ int cmd_clone(int argc, const char **argv, const char *prefix)

						remote_head = NULL;

						option_no_checkout = 1;

						if (!option_bare) {

							const char *branch = git_default_branch_name(0);

							char *ref = xstrfmt("refs/heads/%s", branch);

							const char *branch;

							char *ref;

							if (transport_ls_refs_options.unborn_head_target &&

							    skip_prefix(transport_ls_refs_options.unborn_head_target,

									"refs/heads/", &branch)) {

								ref = transport_ls_refs_options.unborn_head_target;

								transport_ls_refs_options.unborn_head_target = NULL;

								create_symref("HEAD", ref, reflog_msg.buf);

							} else {

								branch = git_default_branch_name(0);

								ref = xstrfmt("refs/heads/%s", branch);

							}

							install_branch_config(0, branch, remote_name, ref);

							free(ref);

				@ -1382,6 +1397,7 @@ cleanup:

					strbuf_release(&key);

					junk_mode = JUNK_LEAVE_ALL;

					strvec_clear(&ref_prefixes);

					strvec_clear(&transport_ls_refs_options.ref_prefixes);

					free(transport_ls_refs_options.unborn_head_target);

					return err;

				}

									
										2

builtin/commit.c
									
												View File
												
				@ -1039,7 +1039,7 @@ static const char *find_author_by_nickname(const char *name)

					av[++ac] = NULL;

					setup_revisions(ac, av, &revs, NULL);

					revs.mailmap = &mailmap;

					read_mailmap(revs.mailmap, NULL);

					read_mailmap(revs.mailmap);

					if (prepare_revision_walk(&revs))

						die(_("revision walk setup failed"));

									
										2

builtin/describe.c
									
												View File
												
				@ -194,7 +194,7 @@ static int get_name(const char *path, const struct object_id *oid, int flag, voi

					}

					/* Is it annotated? */

					if (!peel_ref(path, &peeled)) {

					if (!peel_iterated_oid(oid, &peeled)) {

						is_annotated = !oideq(oid, &peeled);

					} else {

						oidcpy(&peeled, oid);

									
										8

builtin/diff-files.c
									
												View File
												
				@ -7,6 +7,7 @@

				#include "cache.h"

				#include "config.h"

				#include "diff.h"

				#include "diff-merges.h"

				#include "commit.h"

				#include "revision.h"

				#include "builtin.h"

				@ -35,7 +36,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)

					 */

					rev.diffopt.ita_invisible_in_index = 1;

					precompose_argv(argc, argv);

					prefix = precompose_argv_prefix(argc, argv, prefix);

					argc = setup_revisions(argc, argv, &rev, NULL);

					while (1 < argc && argv[1][0] == '-') {

				@ -53,6 +54,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)

					}

					if (!rev.diffopt.output_format)

						rev.diffopt.output_format = DIFF_FORMAT_RAW;

					rev.diffopt.rotate_to_strict = 1;

					/*

					 * Make sure there are NO revision (i.e. pending object) parameter,

				@ -69,9 +71,9 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)

					 * was not asked to.  "diff-files -c -p" should not densify

					 * (the user should ask with "diff-files --cc" explicitly).

					 */

					if (rev.max_count == -1 && !rev.combine_merges &&

					if (rev.max_count == -1 &&

					    (rev.diffopt.output_format & DIFF_FORMAT_PATCH))

						rev.combine_merges = rev.dense_combined_merges = 1;

						diff_merges_set_dense_combined_if_unset(&rev);

					if (read_cache_preload(&rev.diffopt.pathspec) < 0) {

						perror("read_cache_preload");

									
										4

builtin/diff-index.c
									
												View File
												
				@ -25,7 +25,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)

					git_config(git_diff_basic_config, NULL); /* no "diff" UI options */

					repo_init_revisions(the_repository, &rev, prefix);

					rev.abbrev = 0;

					precompose_argv(argc, argv);

					prefix = precompose_argv_prefix(argc, argv, prefix);

					argc = setup_revisions(argc, argv, &rev, NULL);

					for (i = 1; i < argc; i++) {

				@ -41,6 +41,8 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)

					if (!rev.diffopt.output_format)

						rev.diffopt.output_format = DIFF_FORMAT_RAW;

					rev.diffopt.rotate_to_strict = 1;

					/*

					 * Make sure there is one revision (i.e. pending object),

					 * and there is no revision filtering parameters.

									
										5

builtin/diff-tree.c
									
												View File
												
				@ -126,7 +126,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)

					memset(&s_r_opt, 0, sizeof(s_r_opt));

					s_r_opt.tweak = diff_tree_tweak_rev;

					precompose_argv(argc, argv);

					prefix = precompose_argv_prefix(argc, argv, prefix);

					argc = setup_revisions(argc, argv, opt, &s_r_opt);

					memset(&w, 0, sizeof(w));

				@ -156,6 +156,8 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)

					if (merge_base && opt->pending.nr != 2)

						die(_("--merge-base only works with two commits"));

					opt->diffopt.rotate_to_strict = 1;

					/*

					 * NOTE!  We expect "a..b" to expand to "^a b" but it is

					 * perfectly valid for revision range parser to yield "b ^a",

				@ -192,6 +194,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)

						int saved_nrl = 0;

						int saved_dcctc = 0;

						opt->diffopt.rotate_to_strict = 0;

						if (opt->diffopt.detect_rename) {

							if (!the_index.cache)

								repo_read_index(the_repository);

Compare commits

862 Commits v2.30.6 ... v2.31.7

1 .gitattributes vendored Unescape Escape View File

2 .github/workflows/main.yml vendored Unescape Escape View File

2 .travis.yml Unescape Escape View File

154 CODE_OF_CONDUCT.md Unescape Escape View File

1 Documentation/Makefile Unescape Escape View File

2 Documentation/MyFirstContribution.txt Unescape Escape View File

86 Documentation/RelNotes/2.30.7.txt Normal file Unescape Escape View File

52 Documentation/RelNotes/2.30.8.txt Normal file Unescape Escape View File

365 Documentation/RelNotes/2.31.0.txt Normal file Unescape Escape View File

27 Documentation/RelNotes/2.31.1.txt Normal file Unescape Escape View File

6 Documentation/RelNotes/2.31.2.txt Normal file Unescape Escape View File

4 Documentation/RelNotes/2.31.3.txt Normal file Unescape Escape View File

6 Documentation/RelNotes/2.31.4.txt Normal file Unescape Escape View File

5 Documentation/RelNotes/2.31.5.txt Normal file Unescape Escape View File

5 Documentation/RelNotes/2.31.6.txt Normal file Unescape Escape View File

6 Documentation/RelNotes/2.31.7.txt Normal file Unescape Escape View File

2 Documentation/blame-options.txt Unescape Escape View File

4 Documentation/config.txt Unescape Escape View File

2 Documentation/config/core.txt Unescape Escape View File

2 Documentation/config/diff.txt Unescape Escape View File

2 Documentation/config/init.txt Unescape Escape View File

9 Documentation/config/lsrefs.txt Normal file Unescape Escape View File

5 Documentation/config/maintenance.txt Unescape Escape View File

15 Documentation/config/mergetool.txt Unescape Escape View File

7 Documentation/config/pack.txt Unescape Escape View File

3 Documentation/config/rebase.txt Unescape Escape View File

11 Documentation/date-formats.txt Unescape Escape View File

6 Documentation/diff-generate-patch.txt Unescape Escape View File

59 Documentation/diff-options.txt Unescape Escape View File

4 Documentation/fetch-options.txt Unescape Escape View File

2 Documentation/git-am.txt Unescape Escape View File

2 Documentation/git-blame.txt Unescape Escape View File

6 Documentation/git-branch.txt Unescape Escape View File

9 Documentation/git-check-mailmap.txt Unescape Escape View File

16 Documentation/git-config.txt Unescape Escape View File

8 Documentation/git-difftool.txt Unescape Escape View File

8 Documentation/git-for-each-ref.txt Unescape Escape View File

14 Documentation/git-gc.txt Unescape Escape View File

10 Documentation/git-http-fetch.txt Unescape Escape View File

25 Documentation/git-index-pack.txt Unescape Escape View File

46 Documentation/git-log.txt Unescape Escape View File

8 Documentation/git-ls-files.txt Unescape Escape View File

4 Documentation/git-mailinfo.txt Unescape Escape View File

122 Documentation/git-maintenance.txt Unescape Escape View File

4 Documentation/git-mergetool--lib.txt Unescape Escape View File

4 Documentation/git-mergetool.txt Unescape Escape View File

39 Documentation/git-mktag.txt Unescape Escape View File

11 Documentation/git-pack-objects.txt Unescape Escape View File

20 Documentation/git-range-diff.txt Unescape Escape View File

9 Documentation/git-repack.txt Unescape Escape View File

93 Documentation/git-rev-list.txt Unescape Escape View File

74 Documentation/git-rev-parse.txt Unescape Escape View File

8 Documentation/git-shortlog.txt Unescape Escape View File

7 Documentation/git-show.txt Unescape Escape View File

8 Documentation/git-stash.txt Unescape Escape View File

2 Documentation/git-status.txt Unescape Escape View File

2 Documentation/git-tag.txt Unescape Escape View File

79 Documentation/git-worktree.txt Unescape Escape View File

24 Documentation/git.txt Unescape Escape View File

41 Documentation/gitdiffcore.txt Unescape Escape View File

123 Documentation/gitmailmap.txt Normal file Unescape Escape View File

2 Documentation/i18n.txt Unescape Escape View File

75 Documentation/mailmap.txt Unescape Escape View File

34 Documentation/pretty-formats.txt Unescape Escape View File

14 Documentation/rev-list-options.txt Unescape Escape View File

116 Documentation/technical/chunk-format.txt Normal file Unescape Escape View File

31 Documentation/technical/commit-graph-format.txt Unescape Escape View File

77 Documentation/technical/commit-graph.txt Unescape Escape View File

293 Documentation/technical/hash-function-transition.txt Unescape Escape View File

42 Documentation/technical/index-format.txt Unescape Escape View File

23 Documentation/technical/pack-format.txt Unescape Escape View File

15 Documentation/technical/protocol-v2.txt Unescape Escape View File

42 Documentation/technical/reftable.txt Unescape Escape View File

2 GIT-VERSION-GEN Unescape Escape View File

55 Makefile Unescape Escape View File

2 RelNotes Unescape Escape View File

64 abspath.c Unescape Escape View File

6 add-interactive.c Unescape Escape View File

862 Commits

v2.30.6 ... v2.31.7

1

.gitattributes vendored

View File

2

.github/workflows/main.yml vendored

View File

2

.travis.yml

View File

154

CODE_OF_CONDUCT.md

View File

1

Documentation/Makefile

View File

2

Documentation/MyFirstContribution.txt

View File

86

Documentation/RelNotes/2.30.7.txt Normal file

View File

52

Documentation/RelNotes/2.30.8.txt Normal file

View File

365

Documentation/RelNotes/2.31.0.txt Normal file

View File

27

Documentation/RelNotes/2.31.1.txt Normal file

View File

6

Documentation/RelNotes/2.31.2.txt Normal file

View File

4

Documentation/RelNotes/2.31.3.txt Normal file

View File

6

Documentation/RelNotes/2.31.4.txt Normal file

View File

5

Documentation/RelNotes/2.31.5.txt Normal file

View File

5

Documentation/RelNotes/2.31.6.txt Normal file

View File

6

Documentation/RelNotes/2.31.7.txt Normal file

View File

2

Documentation/blame-options.txt

View File

4

Documentation/config.txt

View File

2

Documentation/config/core.txt

View File

2

Documentation/config/diff.txt

View File

2

Documentation/config/init.txt

View File

9

Documentation/config/lsrefs.txt Normal file

View File

5

Documentation/config/maintenance.txt

View File

15

Documentation/config/mergetool.txt

View File

7

Documentation/config/pack.txt

View File

3

Documentation/config/rebase.txt

View File

11

Documentation/date-formats.txt

View File

6

Documentation/diff-generate-patch.txt

View File

59

Documentation/diff-options.txt

View File

4

Documentation/fetch-options.txt

View File

2

Documentation/git-am.txt

View File

2

Documentation/git-blame.txt

View File

6

Documentation/git-branch.txt

View File

9

Documentation/git-check-mailmap.txt

View File

16

Documentation/git-config.txt

View File

8

Documentation/git-difftool.txt

View File

8

Documentation/git-for-each-ref.txt

View File

14

Documentation/git-gc.txt

View File

10

Documentation/git-http-fetch.txt

View File

25

Documentation/git-index-pack.txt

View File

46

Documentation/git-log.txt

View File

8

Documentation/git-ls-files.txt

View File

4

Documentation/git-mailinfo.txt

View File

122

Documentation/git-maintenance.txt

View File

4

Documentation/git-mergetool--lib.txt

View File

4

Documentation/git-mergetool.txt

View File

39

Documentation/git-mktag.txt

View File

11

Documentation/git-pack-objects.txt

View File

20

Documentation/git-range-diff.txt

View File

9

Documentation/git-repack.txt

View File

93

Documentation/git-rev-list.txt

View File

74

Documentation/git-rev-parse.txt

View File

8

Documentation/git-shortlog.txt

View File

7

Documentation/git-show.txt

View File

8

Documentation/git-stash.txt

View File

2

Documentation/git-status.txt

View File

2

Documentation/git-tag.txt

View File

79

Documentation/git-worktree.txt

View File

24

Documentation/git.txt

View File

41

Documentation/gitdiffcore.txt

View File

123

Documentation/gitmailmap.txt Normal file

View File

2

Documentation/i18n.txt

View File

75

Documentation/mailmap.txt

View File

34

Documentation/pretty-formats.txt

View File

14

Documentation/rev-list-options.txt

View File

116

Documentation/technical/chunk-format.txt Normal file

View File

31

Documentation/technical/commit-graph-format.txt

View File

77

Documentation/technical/commit-graph.txt

View File

293

Documentation/technical/hash-function-transition.txt

View File

42

Documentation/technical/index-format.txt

View File

23

Documentation/technical/pack-format.txt

View File

15

Documentation/technical/protocol-v2.txt

View File

42

Documentation/technical/reftable.txt

View File

2

GIT-VERSION-GEN

View File

55

Makefile

View File

2

RelNotes

View File

64

abspath.c

View File

6

add-interactive.c

View File

33

apply.c

View File