Recently, a vulnerability was reported that can lead to an out-of-bounds
write when reading an unreasonably large gitattributes file. The root
cause of this error are multiple integer overflows in different parts of
the code when there are either too many lines, when paths are too long,
when attribute names are too long, or when there are too many attributes
declared for a pattern.
As all of these are related to size, it seems reasonable to restrict the
size of the gitattributes file via git-fsck(1). This allows us to both
stop distributing known-vulnerable objects via common hosting platforms
that have fsck enabled, and users to protect themselves by enabling the
`fetch.fsckObjects` config.
There are basically two checks:
1. We verify that size of the gitattributes file is smaller than
100MB.
2. We verify that the maximum line length does not exceed 2048
bytes.
With the preceding commits, both of these conditions would cause us to
either ignore the complete gitattributes file or blob in the first case,
or the specific line in the second case. Now with these consistency
checks added, we also grow the ability to stop distributing such files
in the first place when `receive.fsckObjects` is enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the checks for gitattributes so that they can be extended more
readily.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `fsck_finish()` we check all blobs for consistency that we have found
during the tree walk, but that haven't yet been checked. This is only
required for gitmodules right now, but will also be required for a new
check for gitattributes.
Pull out a function `fsck_blobs()` that allows the caller to check a set
of blobs for consistency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In general, we don't need to validate blob contents as they are opaque
blobs about whose content Git doesn't need to care about. There are some
exceptions though when blobs are linked into trees so that they would be
interpreted by Git. We only have a single such check right now though,
which is the one for gitmodules that has been added in the context of
CVE-2018-11235.
Now we have found another vulnerability with gitattributes that can lead
to out-of-bounds writes and reads. So let's refactor `fsck_blob()` so
that it is more extensible and can check different types of blobs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the padding and wrapping formatting directives allow the caller to
specify an integer that ultimately leads to us adding this many chars to
the result buffer. As a consequence, it is trivial to e.g. allocate 2GB
of RAM via a single formatting directive and cause resource exhaustion
on the machine executing this logic. Furthermore, it is debatable
whether there are any sane usecases that require the user to pad data to
2GB boundaries or to indent wrapped data by 2GB.
Restrict the input sizes to 16 kilobytes at a maximum to limit the
amount of bytes that can be requested by the user. This is not meant
as a fix because there are ways to trivially amplify the amount of
data we generate via formatting directives; the real protection is
achieved by the changes in previous steps to catch and avoid integer
wraparound that causes us to under-allocate and access beyond the
end of allocated memory reagions. But having such a limit
significantly helps fuzzing the pretty format, because the fuzzer is
otherwise quite fast to run out-of-memory as it discovers these
formatters.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `strbuf_utf8_replace`, we preallocate the destination buffer and then
use `memcpy` to copy bytes into it at computed offsets. This feels
rather fragile and is hard to understand at times. Refactor the code to
instead use `strbuf_add` and `strbuf_addstr` so that we can be sure that
there is no possibility to perform an out-of-bounds write.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `strbuf_utf8_replace()`, we call `utf8_width()` to compute the width
of the current glyph. If the glyph is a control character though it can
be that `utf8_width()` returns `-1`, but because we assign this value to
a `size_t` the conversion will cause us to underflow. This bug can
easily be triggered with the following command:
$ git log --pretty='format:xxx%<|(1,trunc)%x10'
>From all I can see though this seems to be a benign underflow that has
no security-related consequences.
Fix the bug by using an `int` instead. When we see a control character,
we now copy it into the target buffer but don't advance the current
width of the string.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.
This can lead to an out-of-bounds write with `--pretty=format:%<1)%B`
and a commit message that is 2^31+1 bytes long:
=================================================================
==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
WRITE of size 2147483649 at 0x603000001168 thread T0
#0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
#1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
#2 0x5612bbb1087a in format_commit_item pretty.c:1801
#3 0x5612bbc33bab in strbuf_expand strbuf.c:429
#4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
#5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
#6 0x5612bba0a4d5 in show_log log-tree.c:781
#7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
#8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
#9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
#10 0x5612bb6951a2 in cmd_log builtin/log.c:883
#11 0x5612bb56c993 in run_builtin git.c:466
#12 0x5612bb56d397 in handle_builtin git.c:721
#13 0x5612bb56db07 in run_argv git.c:788
#14 0x5612bb56e8a7 in cmd_main git.c:923
#15 0x5612bb803682 in main common-main.c:57
#16 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f)
#17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115
0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
allocated by thread T0 here:
#0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x5612bbcdd556 in xrealloc wrapper.c:136
#2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
#3 0x5612bbc32acd in strbuf_add strbuf.c:298
#4 0x5612bbc33aec in strbuf_expand strbuf.c:418
#5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
#6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
#7 0x5612bba0a4d5 in show_log log-tree.c:781
#8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
#9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
#10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
#11 0x5612bb6951a2 in cmd_log builtin/log.c:883
#12 0x5612bb56c993 in run_builtin git.c:466
#13 0x5612bb56d397 in handle_builtin git.c:721
#14 0x5612bb56db07 in run_argv git.c:788
#15 0x5612bb56e8a7 in cmd_main git.c:923
#16 0x5612bb803682 in main common-main.c:57
#17 0x7f95c4c3c28f (/usr/lib/libc.so.6+0x2328f)
SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
Shadow bytes around the buggy address:
0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
=>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==26009==ABORTING
Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.
Add a test that would have previously caused us to crash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `utf8_strnwidth()` function calls `utf8_width()` in a loop and adds
its returned width to the end result. `utf8_width()` can return `-1`
though in case it reads a control character, which means that the
computed string width is going to be wrong. In the worst case where
there are more control characters than non-control characters, we may
even return a negative string width.
Fix this bug by treating control characters as having zero width.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `utf8_strnwidth()` function accepts an optional string length as
input parameter. This parameter can either be set to `-1`, in which case
we call `strlen()` on the input. Or it can be set to a positive integer
that indicates a precomputed length, which callers typically compute by
calling `strlen()` at some point themselves.
The input parameter is an `int` though, whereas `strlen()` returns a
`size_t`. This can lead to implementation-defined behaviour though when
the `size_t` cannot be represented by the `int`. In the general case
though this leads to wrap-around and thus to negative string sizes,
which is sure enough to not lead to well-defined behaviour.
Fix this by accepting a `size_t` instead of an `int` as string length.
While this takes away the ability of callers to simply pass in `-1` as
string length, it really is trivial enough to convert them to instead
pass in `strlen()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `%w(width,indent1,indent2)` formatting directive can be used to
rewrap text to a specific width and is designed after git-shortlog(1)'s
`-w` parameter. While the three parameters are all stored as `size_t`
internally, `strbuf_add_wrapped_text()` accepts integers as input. As a
result, the casted integers may overflow. As these now-negative integers
are later on passed to `strbuf_addchars()`, we will ultimately run into
implementation-defined behaviour due to casting a negative number back
to `size_t` again. On my platform, this results in trying to allocate
9000 petabyte of memory.
Fix this overflow by using `cast_size_t_to_int()` so that we reject
inputs that cannot be represented as an integer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a formatting directive has a `+` or ` ` after the `%`, then we add
either a line feed or space if the placeholder expands to a non-empty
string. In specific cases though this logic doesn't work as expected,
and we try to add the character even in the case where the formatting
directive is empty.
One such pattern is `%w(1)%+d%+w(2)`. `%+d` expands to reference names
pointing to a certain commit, like in `git log --decorate`. For a tagged
commit this would for example expand to `\n (tag: v1.0.0)`, which has a
leading newline due to the `+` modifier and a space added by `%d`. Now
the second wrapping directive will cause us to rewrap the text to
`\n(tag:\nv1.0.0)`, which is one byte shorter due to the missing leading
space. The code that handles the `+` magic now notices that the length
has changed and will thus try to insert a leading line feed at the
original posititon. But as the string was shortened, the original
position is past the buffer's boundary and thus we die with an error.
Now there are two issues here:
1. We check whether the buffer length has changed, not whether it
has been extended. This causes us to try and add the character
past the string boundary.
2. The current logic does not make any sense whatsoever. When the
string got expanded due to the rewrap, putting the separator into
the original position is likely to put it somewhere into the
middle of the rewrapped contents.
It is debatable whether `%+w()` makes any sense in the first place.
Strictly speaking, the placeholder never expands to a non-empty string,
and consequentially we shouldn't ever accept this combination. We thus
fix the bug by simply refusing `%+w()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An out-of-bounds read can be triggered when parsing an incomplete
padding format string passed via `--pretty=format` or in Git archives
when files are marked with the `export-subst` gitattribute.
This bug exists since we have introduced support for truncating output
via the `trunc` keyword a7f01c6b4d (pretty: support truncating in %>, %<
and %><, 2013-04-19). Before this commit, we used to find the end of the
formatting string by using strchr(3P). This function returns a `NULL`
pointer in case the character in question wasn't found. The subsequent
check whether any character was found thus simply checked the returned
pointer. After the commit we switched to strcspn(3P) though, which only
returns the offset to the first found character or to the trailing NUL
byte. As the end pointer is now computed by adding the offset to the
start pointer it won't be `NULL` anymore, and as a consequence the check
doesn't do anything anymore.
The out-of-bounds data that is being read can in fact end up in the
formatted string. As a consequence, it is possible to leak memory
contents either by calling git-log(1) or via git-archive(1) when any of
the archived files is marked with the `export-subst` gitattribute.
==10888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000398 at pc 0x7f0356047cb2 bp 0x7fff3ffb95d0 sp 0x7fff3ffb8d78
READ of size 1 at 0x602000000398 thread T0
#0 0x7f0356047cb1 in __interceptor_strchrnul /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725
#1 0x563b7cec9a43 in strbuf_expand strbuf.c:417
#2 0x563b7cda7060 in repo_format_commit_message pretty.c:1869
#3 0x563b7cda8d0f in pretty_print_commit pretty.c:2161
#4 0x563b7cca04c8 in show_log log-tree.c:781
#5 0x563b7cca36ba in log_tree_commit log-tree.c:1117
#6 0x563b7c927ed5 in cmd_log_walk_no_free builtin/log.c:508
#7 0x563b7c92835b in cmd_log_walk builtin/log.c:549
#8 0x563b7c92b1a2 in cmd_log builtin/log.c:883
#9 0x563b7c802993 in run_builtin git.c:466
#10 0x563b7c803397 in handle_builtin git.c:721
#11 0x563b7c803b07 in run_argv git.c:788
#12 0x563b7c8048a7 in cmd_main git.c:923
#13 0x563b7ca99682 in main common-main.c:57
#14 0x7f0355e3c28f (/usr/lib/libc.so.6+0x2328f)
#15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115
0x602000000398 is located 0 bytes to the right of 8-byte region [0x602000000390,0x602000000398)
allocated by thread T0 here:
#0 0x7f0356072faa in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x563b7cf7317c in xstrdup wrapper.c:39
#2 0x563b7cd9a06a in save_user_format pretty.c:40
#3 0x563b7cd9b3e5 in get_commit_format pretty.c:173
#4 0x563b7ce54ea0 in handle_revision_opt revision.c:2456
#5 0x563b7ce597c9 in setup_revisions revision.c:2850
#6 0x563b7c9269e0 in cmd_log_init_finish builtin/log.c:269
#7 0x563b7c927362 in cmd_log_init builtin/log.c:348
#8 0x563b7c92b193 in cmd_log builtin/log.c:882
#9 0x563b7c802993 in run_builtin git.c:466
#10 0x563b7c803397 in handle_builtin git.c:721
#11 0x563b7c803b07 in run_argv git.c:788
#12 0x563b7c8048a7 in cmd_main git.c:923
#13 0x563b7ca99682 in main common-main.c:57
#14 0x7f0355e3c28f (/usr/lib/libc.so.6+0x2328f)
#15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115
SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 in __interceptor_strchrnul
Shadow bytes around the buggy address:
0x0c047fff8020: fa fa fd fd fa fa 00 06 fa fa 05 fa fa fa fd fd
0x0c047fff8030: fa fa 00 02 fa fa 06 fa fa fa 05 fa fa fa fd fd
0x0c047fff8040: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa 00 00
0x0c047fff8050: fa fa 00 01 fa fa fd fd fa fa 00 00 fa fa 00 01
0x0c047fff8060: fa fa 00 06 fa fa 00 06 fa fa 05 fa fa fa 05 fa
=>0x0c047fff8070: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd
0x0c047fff8080: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa fd fa
0x0c047fff8090: fa fa fd fd fa fa 00 00 fa fa fa fa fa fa fa fa
0x0c047fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff80b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff80c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==10888==ABORTING
Fix this bug by checking whether `end` points at the trailing NUL byte.
Add a test which catches this out-of-bounds read and which demonstrates
that we used to write out-of-bounds data into the formatted message.
Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Original-patch-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the `%>>(<N>)` pretty formatter, you can ask git-log(1) et al to
steal spaces. To do so we need to look ahead of the next token to see
whether there are spaces there. This loop takes into account ANSI
sequences that end with an `m`, and if it finds any it will skip them
until it finds the first space. While doing so it does not take into
account the buffer's limits though and easily does an out-of-bounds
read.
Add a test that hits this behaviour. While we don't have an easy way to
verify this, the test causes the following failure when run with
`SANITIZE=address`:
==37941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000baf at pc 0x55ba6f88e0d0 bp 0x7ffc84c50d20 sp 0x7ffc84c50d10
READ of size 1 at 0x603000000baf thread T0
#0 0x55ba6f88e0cf in format_and_pad_commit pretty.c:1712
#1 0x55ba6f88e7b4 in format_commit_item pretty.c:1801
#2 0x55ba6f9b1ae4 in strbuf_expand strbuf.c:429
#3 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
#4 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
#5 0x55ba6f7884c8 in show_log log-tree.c:781
#6 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
#7 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
#8 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
#9 0x55ba6f4131a2 in cmd_log builtin/log.c:883
#10 0x55ba6f2ea993 in run_builtin git.c:466
#11 0x55ba6f2eb397 in handle_builtin git.c:721
#12 0x55ba6f2ebb07 in run_argv git.c:788
#13 0x55ba6f2ec8a7 in cmd_main git.c:923
#14 0x55ba6f581682 in main common-main.c:57
#15 0x7f2d08c3c28f (/usr/lib/libc.so.6+0x2328f)
#16 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#17 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115
0x603000000baf is located 1 bytes to the left of 24-byte region [0x603000000bb0,0x603000000bc8)
allocated by thread T0 here:
#0 0x7f2d08ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x55ba6fa5b494 in xrealloc wrapper.c:136
#2 0x55ba6f9aefdc in strbuf_grow strbuf.c:99
#3 0x55ba6f9b0a06 in strbuf_add strbuf.c:298
#4 0x55ba6f9b1a25 in strbuf_expand strbuf.c:418
#5 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
#6 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
#7 0x55ba6f7884c8 in show_log log-tree.c:781
#8 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
#9 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
#10 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
#11 0x55ba6f4131a2 in cmd_log builtin/log.c:883
#12 0x55ba6f2ea993 in run_builtin git.c:466
#13 0x55ba6f2eb397 in handle_builtin git.c:721
#14 0x55ba6f2ebb07 in run_argv git.c:788
#15 0x55ba6f2ec8a7 in cmd_main git.c:923
#16 0x55ba6f581682 in main common-main.c:57
#17 0x7f2d08c3c28f (/usr/lib/libc.so.6+0x2328f)
#18 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#19 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115
SUMMARY: AddressSanitizer: heap-buffer-overflow pretty.c:1712 in format_and_pad_commit
Shadow bytes around the buggy address:
0x0c067fff8120: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
0x0c067fff8130: fd fd fa fa fd fd fd fd fa fa fd fd fd fa fa fa
0x0c067fff8140: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
0x0c067fff8150: fa fa fd fd fd fd fa fa 00 00 00 fa fa fa fd fd
0x0c067fff8160: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
=>0x0c067fff8170: fd fd fd fa fa[fa]00 00 00 fa fa fa 00 00 00 fa
0x0c067fff8180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff81a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff81b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff81c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Luckily enough, this would only cause us to copy the out-of-bounds data
into the formatted commit in case we really had an ANSI sequence
preceding our buffer. So this bug likely has no security consequences.
Fix it regardless by not traversing past the buffer's start.
Reported-by: Patrick Steinhardt <ps@pks.im>
Reported-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using a padding specifier in the pretty format passed to git-log(1)
we need to calculate the string length in several places. These string
lengths are stored in `int`s though, which means that these can easily
overflow when the input lengths exceeds 2GB. This can ultimately lead to
an out-of-bounds write when these are used in a call to memcpy(3P):
==8340==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1ec62f97fe at pc 0x7f2127e5f427 bp 0x7ffd3bd63de0 sp 0x7ffd3bd63588
WRITE of size 1 at 0x7f1ec62f97fe thread T0
#0 0x7f2127e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
#1 0x5628e96aa605 in format_and_pad_commit pretty.c:1762
#2 0x5628e96aa7f4 in format_commit_item pretty.c:1801
#3 0x5628e97cdb24 in strbuf_expand strbuf.c:429
#4 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
#5 0x5628e96acd0f in pretty_print_commit pretty.c:2161
#6 0x5628e95a44c8 in show_log log-tree.c:781
#7 0x5628e95a76ba in log_tree_commit log-tree.c:1117
#8 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
#9 0x5628e922c35b in cmd_log_walk builtin/log.c:549
#10 0x5628e922f1a2 in cmd_log builtin/log.c:883
#11 0x5628e9106993 in run_builtin git.c:466
#12 0x5628e9107397 in handle_builtin git.c:721
#13 0x5628e9107b07 in run_argv git.c:788
#14 0x5628e91088a7 in cmd_main git.c:923
#15 0x5628e939d682 in main common-main.c:57
#16 0x7f2127c3c28f (/usr/lib/libc.so.6+0x2328f)
#17 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#18 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115
0x7f1ec62f97fe is located 2 bytes to the left of 4831838265-byte region [0x7f1ec62f9800,0x7f1fe62f9839)
allocated by thread T0 here:
#0 0x7f2127ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x5628e98774d4 in xrealloc wrapper.c:136
#2 0x5628e97cb01c in strbuf_grow strbuf.c:99
#3 0x5628e97ccd42 in strbuf_addchars strbuf.c:327
#4 0x5628e96aa55c in format_and_pad_commit pretty.c:1761
#5 0x5628e96aa7f4 in format_commit_item pretty.c:1801
#6 0x5628e97cdb24 in strbuf_expand strbuf.c:429
#7 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
#8 0x5628e96acd0f in pretty_print_commit pretty.c:2161
#9 0x5628e95a44c8 in show_log log-tree.c:781
#10 0x5628e95a76ba in log_tree_commit log-tree.c:1117
#11 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
#12 0x5628e922c35b in cmd_log_walk builtin/log.c:549
#13 0x5628e922f1a2 in cmd_log builtin/log.c:883
#14 0x5628e9106993 in run_builtin git.c:466
#15 0x5628e9107397 in handle_builtin git.c:721
#16 0x5628e9107b07 in run_argv git.c:788
#17 0x5628e91088a7 in cmd_main git.c:923
#18 0x5628e939d682 in main common-main.c:57
#19 0x7f2127c3c28f (/usr/lib/libc.so.6+0x2328f)
#20 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#21 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115
SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
Shadow bytes around the buggy address:
0x0fe458c572a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fe458c572b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fe458c572c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fe458c572d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fe458c572e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0fe458c572f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
0x0fe458c57300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fe458c57310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fe458c57320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fe458c57330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fe458c57340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==8340==ABORTING
The pretty format can also be used in `git archive` operations via the
`export-subst` attribute. So this is what in our opinion makes this a
critical issue in the context of Git forges which allow to download an
archive of user supplied Git repositories.
Fix this vulnerability by using `size_t` instead of `int` to track the
string lengths. Add tests which detect this vulnerability when Git is
compiled with the address sanitizer.
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Original-patch-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Modified-by: Taylor Blau <me@ttalorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow tests that assume a 64-bit `size_t` to be skipped in 32-bit
platforms and regardless of the size of `long`.
This imitates the `LONG_IS_64BIT` prerequisite.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, start ignoring gitattributes files
that are overly large to protect us against out-of-bounds reads and
writes caused by integer overflows. Unfortunately, we cannot just define
"overly large" in terms of any preexisting limits in the codebase.
Instead, we choose a very conservative limit of 100MB. This is plenty of
room for specifying gitattributes, and incidentally it is also the limit
for blob sizes for GitHub. While we don't want GitHub to dictate limits
here, it is still sensible to use this fact for an informed decision
given that it is hosting a huge set of repositories. Furthermore, over
at GitLab we scanned a subset of repositories for their root-level
attribute files. We found that 80% of them have a gitattributes file
smaller than 100kB, 99.99% have one smaller than 1MB, and only a single
repository had one that was almost 3MB in size. So enforcing a limit of
100MB seems to give us ample of headroom.
With this limit in place we can be reasonably sure that there is no easy
way to exploit the gitattributes file via integer overflows anymore.
Furthermore, it protects us against resource exhaustion caused by
allocating the in-memory data structures required to represent the
parsed attributes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different code paths to read gitattributes: once via a
file, and once via the index. These two paths used to behave differently
because when reading attributes from a file, we used fgets(3P) with a
buffer size of 2kB. Consequentially, we silently truncate line lengths
when lines are longer than that and will then parse the remainder of the
line as a new pattern. It goes without saying that this is entirely
unexpected, but it's even worse that the behaviour depends on how the
gitattributes are parsed.
While this is simply wrong, the silent truncation saves us with the
recently discovered vulnerabilities that can cause out-of-bound writes
or reads with unreasonably long lines due to integer overflows. As the
common path is to read gitattributes via the worktree file instead of
via the index, we can assume that any gitattributes file that had lines
longer than that is already broken anyway. So instead of lifting the
limit here, we can double down on it to fix the vulnerabilities.
Introduce an explicit line length limit of 2kB that is shared across all
paths that read attributes and ignore any line that hits this limit
while printing a warning.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading attributes from a file we use fgets(3P) with a buffer size
of 2048 bytes. This means that as soon as a line exceeds the buffer size
we split it up into multiple parts and parse each of them as a separate
pattern line. This is of course not what the user intended, and even
worse the behaviour is inconsistent with how we read attributes from the
index.
Fix this bug by converting the code to use `strbuf_getline()` instead.
This will indeed read in the whole line, which may theoretically lead to
an out-of-memory situation when the gitattributes file is huge. We're
about to reject any gitattributes files larger than 100MB in the next
commit though, which makes this less of a concern.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing an attributes line, we need to allocate an array that holds
all attributes specified for the given file pattern. The calculation to
determine the number of bytes that need to be allocated was prone to an
overflow though when there was an unreasonable amount of attributes.
Harden the allocation by instead using the `st_` helper functions that
cause us to die when we hit an integer overflow.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attributes have a field that tracks the position in the `all_attrs`
array they're stored inside. This field gets set via `hashmap_get_size`
when adding the attribute to the global map of attributes. But while the
field is of type `int`, the value returned by `hashmap_get_size` is an
`unsigned int`. It can thus happen that the value overflows, where we
would now dereference teh `all_attrs` array at an out-of-bounds value.
We do have a sanity check for this overflow via an assert that verifies
the index matches the new hashmap's size. But asserts are not a proper
mechanism to detect against any such overflows as they may not in fact
be compiled into production code.
Fix this by using an `unsigned int` to track the index and convert the
assert to a call `die()`.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct attr_stack` tracks the stack of all patterns together with
their attributes. When parsing a gitattributes file that has more than
2^31 such patterns though we may trigger multiple out-of-bounds reads on
64 bit platforms. This is because while the `num_matches` variable is an
unsigned integer, we always use a signed integer to iterate over them.
I have not been able to reproduce this issue due to memory constraints
on my systems. But despite the out-of-bounds reads, the worst thing that
can seemingly happen is to call free(3P) with a garbage pointer when
calling `attr_stack_free()`.
Fix this bug by using unsigned integers to iterate over the array. While
this makes the iteration somewhat awkward when iterating in reverse, it
is at least better than knowingly running into an out-of-bounds read.
While at it, convert the call to `ALLOC_GROW` to use `ALLOC_GROW_BY`
instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to trigger an integer overflow when parsing attribute
names when there are more than 2^31 of them for a single pattern. This
can either lead to us dying due to trying to request too many bytes:
blob=$(perl -e 'print "f" . " a=" x 2147483649' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git attr-check --all file
=================================================================
==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
#0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x5563a0a1e3d3 in xcalloc wrapper.c:150
#2 0x5563a058d005 in parse_attr_line attr.c:384
#3 0x5563a058e661 in handle_attr_line attr.c:660
#4 0x5563a058eddb in read_attr_from_index attr.c:769
#5 0x5563a058ef12 in read_attr attr.c:797
#6 0x5563a058f24c in bootstrap_attr_stack attr.c:867
#7 0x5563a058f4a3 in prepare_attr_stack attr.c:902
#8 0x5563a05905da in collect_some_attrs attr.c:1097
#9 0x5563a059093d in git_all_attrs attr.c:1128
#10 0x5563a02f636e in check_attr builtin/check-attr.c:67
#11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183
#12 0x5563a02aa993 in run_builtin git.c:466
#13 0x5563a02ab397 in handle_builtin git.c:721
#14 0x5563a02abb2b in run_argv git.c:788
#15 0x5563a02ac991 in cmd_main git.c:926
#16 0x5563a05432bd in main common-main.c:57
#17 0x7fd3ef82228f (/usr/lib/libc.so.6+0x2328f)
==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1
SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc
==1022==ABORTING
Or, much worse, it can lead to an out-of-bounds write because we
underallocate and then memcpy(3P) into an array:
perl -e '
print "A " . "\rh="x2000000000;
print "\rh="x2000000000;
print "\rh="x294967294 . "\n"
' >.gitattributes
git add .gitattributes
git commit -am "evil attributes"
$ git clone --quiet /path/to/repo
=================================================================
==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58
WRITE of size 8 at 0x602000002550 thread T0
#0 0x5555559884d4 in parse_attr_line attr.c:393
#1 0x5555559884d4 in handle_attr_line attr.c:660
#2 0x555555988902 in read_attr_from_index attr.c:784
#3 0x555555988902 in read_attr_from_index attr.c:747
#4 0x555555988a1d in read_attr attr.c:800
#5 0x555555989b0c in bootstrap_attr_stack attr.c:882
#6 0x555555989b0c in prepare_attr_stack attr.c:917
#7 0x555555989b0c in collect_some_attrs attr.c:1112
#8 0x55555598b141 in git_check_attr attr.c:1126
#9 0x555555a13004 in convert_attrs convert.c:1311
#10 0x555555a95e04 in checkout_entry_ca entry.c:553
#11 0x555555d58bf6 in checkout_entry entry.h:42
#12 0x555555d58bf6 in check_updates unpack-trees.c:480
#13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
#14 0x555555785ab7 in checkout builtin/clone.c:724
#15 0x555555785ab7 in cmd_clone builtin/clone.c:1384
#16 0x55555572443c in run_builtin git.c:466
#17 0x55555572443c in handle_builtin git.c:721
#18 0x555555727872 in run_argv git.c:788
#19 0x555555727872 in cmd_main git.c:926
#20 0x555555721fa0 in main common-main.c:57
#21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
#22 0x555555723f39 in _start (git+0x1cff39)
0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here:
#0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x555555d7fff7 in xcalloc wrapper.c:150
#2 0x55555598815f in parse_attr_line attr.c:384
#3 0x55555598815f in handle_attr_line attr.c:660
#4 0x555555988902 in read_attr_from_index attr.c:784
#5 0x555555988902 in read_attr_from_index attr.c:747
#6 0x555555988a1d in read_attr attr.c:800
#7 0x555555989b0c in bootstrap_attr_stack attr.c:882
#8 0x555555989b0c in prepare_attr_stack attr.c:917
#9 0x555555989b0c in collect_some_attrs attr.c:1112
#10 0x55555598b141 in git_check_attr attr.c:1126
#11 0x555555a13004 in convert_attrs convert.c:1311
#12 0x555555a95e04 in checkout_entry_ca entry.c:553
#13 0x555555d58bf6 in checkout_entry entry.h:42
#14 0x555555d58bf6 in check_updates unpack-trees.c:480
#15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
#16 0x555555785ab7 in checkout builtin/clone.c:724
#17 0x555555785ab7 in cmd_clone builtin/clone.c:1384
#18 0x55555572443c in run_builtin git.c:466
#19 0x55555572443c in handle_builtin git.c:721
#20 0x555555727872 in run_argv git.c:788
#21 0x555555727872 in cmd_main git.c:926
#22 0x555555721fa0 in main common-main.c:57
#23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line
Shadow bytes around the buggy address:
0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00
0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa
0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa
0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02
0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03
=>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa
0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==15062==ABORTING
Fix this bug by using `size_t` instead to count the number of attributes
so that this value cannot reasonably overflow without running out of
memory before already.
Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to trigger an integer overflow when parsing attribute
names that are longer than 2^31 bytes because we assign the result of
strlen(3P) to an `int` instead of to a `size_t`. This can lead to an
abort in vsnprintf(3P) with the following reproducer:
blob=$(perl -e 'print "A " . "B"x2147483648 . "\n"' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git check-attr --all path
BUG: strbuf.c:400: your vsnprintf is broken (returned -1)
But furthermore, assuming that the attribute name is even longer than
that, it can cause us to silently truncate the attribute and thus lead
to wrong results.
Fix this integer overflow by using a `size_t` instead. This fixes the
silent truncation of attribute names, but it only partially fixes the
BUG we hit: even though the initial BUG is fixed, we can still hit a BUG
when parsing invalid attribute lines via `report_invalid_attr()`.
This is due to an underlying design issue in vsnprintf(3P) which only
knows to return an `int`, and thus it may always overflow with large
inputs. This issue is benign though: the worst that can happen is that
the error message is misreported to be either truncated or too long, but
due to the buffer being NUL terminated we wouldn't ever do an
out-of-bounds read here.
Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is an out-of-bounds read possible when parsing gitattributes that
have an attribute that is 2^31+1 bytes long. This is caused due to an
integer overflow when we assign the result of strlen(3P) to an `int`,
where we use the wrapped-around value in a subsequent call to
memcpy(3P). The following code reproduces the issue:
blob=$(perl -e 'print "a" x 2147483649 . " attr"' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git check-attr --all file
AddressSanitizer:DEADLYSIGNAL
=================================================================
==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0)
==8451==The signal is caused by a READ memory access.
#0 0x7f94f1f8f082 (/usr/lib/libc.so.6+0x176082)
#1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752
#2 0x560e190f7f26 in parse_attr_line attr.c:375
#3 0x560e190f9663 in handle_attr_line attr.c:660
#4 0x560e190f9ddd in read_attr_from_index attr.c:769
#5 0x560e190f9f14 in read_attr attr.c:797
#6 0x560e190fa24e in bootstrap_attr_stack attr.c:867
#7 0x560e190fa4a5 in prepare_attr_stack attr.c:902
#8 0x560e190fb5dc in collect_some_attrs attr.c:1097
#9 0x560e190fb93f in git_all_attrs attr.c:1128
#10 0x560e18e6136e in check_attr builtin/check-attr.c:67
#11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183
#12 0x560e18e15993 in run_builtin git.c:466
#13 0x560e18e16397 in handle_builtin git.c:721
#14 0x560e18e16b2b in run_argv git.c:788
#15 0x560e18e17991 in cmd_main git.c:926
#16 0x560e190ae2bd in main common-main.c:57
#17 0x7f94f1e3c28f (/usr/lib/libc.so.6+0x2328f)
#18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082)
==8451==ABORTING
Fix this bug by converting the variable to a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `git_attr_internal()` is called to upsert attributes into
the global map. And while all callers pass a `size_t`, the function
itself accepts an `int` as the attribute name's length. This can lead to
an integer overflow in case the attribute name is longer than `INT_MAX`.
Now this overflow seems harmless as the first thing we do is to call
`attr_name_valid()`, and that function only succeeds in case all chars
in the range of `namelen` match a certain small set of chars. We thus
can't do an out-of-bounds read as NUL is not part of that set and all
strings passed to this function are NUL-terminated. And furthermore, we
wouldn't ever read past the current attribute name anyway due to the
same reason. And if validation fails we will return early.
On the other hand it feels fragile to rely on this behaviour, even more
so given that we pass `namelen` to `FLEX_ALLOC_MEM()`. So let's instead
just do the correct thing here and accept a `size_t` as line length.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This function improperly uses an int to represent the number of entries
in the resulting argument array. This allows a malicious actor to
intentionally overflow the return value, leading to arbitrary heap
writes.
Because the resulting argv array is typically passed to execv(), it may
be possible to leverage this attack to gain remote code execution on a
victim machine. This was almost certainly the case for certain
configurations of git-shell until the previous commit limited the size
of input it would accept. Other calls to split_cmdline() are typically
limited by the size of argv the OS is willing to hand us, so are
similarly protected.
So this is not strictly fixing a known vulnerability, but is a hardening
of the function that is worth doing to protect against possible unknown
vulnerabilities.
One approach to fixing this would be modifying the signature of
`split_cmdline()` to look something like:
int split_cmdline(char *cmdline, const char ***argv, size_t *argc);
Where the return value of `split_cmdline()` is negative for errors, and
zero otherwise. If non-NULL, the `*argc` pointer is modified to contain
the size of the `**argv` array.
But this implies an absurdly large `argv` array, which more than likely
larger than the system's argument limit. So even if split_cmdline()
allowed this, it would fail immediately afterwards when we called
execv(). So instead of converting all of `split_cmdline()`'s callers to
work with `size_t` types in this patch, instead pursue the minimal fix
here to prevent ever returning an array with more than INT_MAX entries
in it.
Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When git-shell is run in interactive mode (which must be enabled by
creating $HOME/git-shell-commands), it reads commands from stdin, one
per line, and executes them.
We read the commands with git_read_line_interactively(), which uses a
strbuf under the hood. That means we'll accept an input of arbitrary
size (limited only by how much heap we can allocate). That creates two
problems:
- the rest of the code is not prepared to handle large inputs. The
most serious issue here is that split_cmdline() uses "int" for most
of its types, which can lead to integer overflow and out-of-bounds
array reads and writes. But even with that fixed, we assume that we
can feed the command name to snprintf() (via xstrfmt()), which is
stuck for historical reasons using "int", and causes it to fail (and
even trigger a BUG() call).
- since the point of git-shell is to take input from untrusted or
semi-trusted clients, it's a mild denial-of-service. We'll allocate
as many bytes as the client sends us (actually twice as many, since
we immediately duplicate the buffer).
We can fix both by just limiting the amount of per-command input we're
willing to receive.
We should also fix split_cmdline(), of course, which is an accident
waiting to happen, but that can come on top. Most calls to
split_cmdline(), including the other one in git-shell, are OK because
they are reading from an OS-provided argv, which is limited in practice.
This patch should eliminate the immediate vulnerabilities.
I picked 4MB as an arbitrary limit. It's big enough that nobody should
ever run into it in practice (since the point is to run the commands via
exec, we're subject to OS limits which are typically much lower). But
it's small enough that allocating it isn't that big a deal.
The code is mostly just swapping out fgets() for the strbuf call, but we
have to add a few niceties like flushing and trimming line endings. We
could simplify things further by putting the buffer on the stack, but
4MB is probably a bit much there. Note that we'll _always_ allocate 4MB,
which for normal, non-malicious requests is more than we would before
this patch. But on the other hand, other git programs are happy to use
96MB for a delta cache. And since we'd never touch most of those pages,
on a lazy-allocating OS like Linux they won't even get allocated to
actual RAM.
The ideal would be a version of strbuf_getline() that accepted a maximum
value. But for a minimal vulnerability fix, let's keep things localized
and simple. We can always refactor further on top.
The included test fails in an obvious way with ASan or UBSan (which
notice the integer overflow and out-of-bounds reads). Without them, it
fails in a less obvious way: we may segfault, or we may try to xstrfmt()
a long string, leading to a BUG(). Either way, it fails reliably before
this patch, and passes with it. Note that we don't need an EXPENSIVE
prereq on it. It does take 10-15s to fail before this patch, but with
the new limit, we fail almost immediately (and the perl process
generating 2GB of data exits via SIGPIPE).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We have no tests of even basic functionality of git-shell. Let's add a
couple of obvious ones. This will serve as a framework for adding tests
for new things we fix, as well as making sure we don't screw anything up
too badly while doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
An earlier patch discussed and fixed a scenario where Git could be used
as a vector to exfiltrate sensitive data through a Docker container when
a potential victim clones a suspicious repository with local submodules
that contain symlinks.
That security hole has since been plugged, but a similar one still
exists. Instead of convincing a would-be victim to clone an embedded
submodule via the "file" protocol, an attacker could convince an
individual to clone a repository that has a submodule pointing to a
valid path on the victim's filesystem.
For example, if an individual (with username "foo") has their home
directory ("/home/foo") stored as a Git repository, then an attacker
could exfiltrate data by convincing a victim to clone a malicious
repository containing a submodule pointing at "/home/foo/.git" with
`--recurse-submodules`. Doing so would expose any sensitive contents in
stored in "/home/foo" tracked in Git.
For systems (such as Docker) that consider everything outside of the
immediate top-level working directory containing a Dockerfile as
inaccessible to the container (with the exception of volume mounts, and
so on), this is a violation of trust by exposing unexpected contents in
the working copy.
To mitigate the likelihood of this kind of attack, adjust the "file://"
protocol's default policy to be "user" to prevent commands that execute
without user input (including recursive submodule initialization) from
taking place by default.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that interact with submodules a handful of times use
`test_config_global`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for changing the default value of `protocol.file.allow` to
"user", update the `prolog()` function in lib-submodule-update to allow
submodules to be cloned over the file protocol.
This is used by a handful of submodule-related test scripts, which
themselves will have to tweak the value of `protocol.file.allow` in
certain locations. Those will be done in subsequent commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.
The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.
One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.
Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.
Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.
That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.
This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.
One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).
Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.
The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.
Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
* maint-2.32:
Git 2.32.3
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.31:
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.30:
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
8959555cee (setup_git_directory(): add an owner check for the top-level
directory, 2022-03-02), adds a function to check for ownership of
repositories using a directory that is representative of it, and ways to
add exempt a specific repository from said check if needed, but that
check didn't account for owership of the gitdir, or (when used) the
gitfile that points to that gitdir.
An attacker could create a git repository in a directory that they can
write into but that is owned by the victim to work around the fix that
was introduced with CVE-2022-24765 to potentially run code as the
victim.
An example that could result in privilege escalation to root in *NIX would
be to set a repository in a shared tmp directory by doing (for example):
$ git -C /tmp init
To avoid that, extend the ensure_valid_ownership function to be able to
check for all three paths.
This will have the side effect of tripling the number of stat() calls
when a repository is detected, but the effect is expected to be likely
minimal, as it is done only once during the directory walk in which Git
looks for a repository.
Additionally make sure to resolve the gitfile (if one was used) to find
the relevant gitdir for checking.
While at it change the message printed on failure so it is clear we are
referring to the repository by its worktree (or gitdir if it is bare) and
not to a specific directory.
Helped-by: Junio C Hamano <junio@pobox.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
With a recent update to refuse access to repositories of other
people by default, "sudo make install" and "sudo git describe"
stopped working. This series intends to loosen it while keeping
the safety.
* cb/path-owner-check-with-sudo:
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Previous changes introduced a regression which will prevent root for
accessing repositories owned by thyself if using sudo because SUDO_UID
takes precedence.
Loosen that restriction by allowing root to access repositories owned
by both uid by default and without having to add a safe.directory
exception.
A previous workaround that was documented in the tests is no longer
needed so it has been removed together with its specially crafted
prerequisite.
Helped-by: Johanness Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a support library that provides one function that can be used
to run a "scriplet" of commands through sudo and that helps invoking
sudo in the slightly awkward way that is required to ensure it doesn't
block the call (if shell was allowed as tested in the prerequisite)
and it doesn't run the command through a different shell than the one
we intended.
Add additional negative tests as suggested by Junio and that use a
new workspace that is owned by root.
Document a regression that was introduced by previous commits where
root won't be able anymore to access directories they own unless
SUDO_UID is removed from their environment.
The tests document additional ways that this new restriction could
be worked around and the documentation explains why it might be instead
considered a feature, but a "fix" is planned for a future change.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
bdc77d1d68 (Add a function to determine whether a path is owned by the
current user, 2022-03-02) checks for the effective uid of the running
process using geteuid() but didn't account for cases where that user was
root (because git was invoked through sudo or a compatible tool) and the
original uid that repository trusted for its config was no longer known,
therefore failing the following otherwise safe call:
guy@renard ~/Software/uncrustify $ sudo git describe --always --dirty
[sudo] password for guy:
fatal: unsafe repository ('/home/guy/Software/uncrustify' is owned by someone else)
Attempt to detect those cases by using the environment variables that
those tools create to keep track of the original user id, and do the
ownership check using that instead.
This assumes the environment the user is running on after going
privileged can't be tampered with, and also adds code to restrict that
the new behavior only applies if running as root, therefore keeping the
most common case, which runs unprivileged, from changing, but because of
that, it will miss cases where sudo (or an equivalent) was used to change
to another unprivileged user or where the equivalent tool used to raise
privileges didn't track the original id in a sudo compatible way.
Because of compatibility with sudo, the code assumes that uid_t is an
unsigned integer type (which is not required by the standard) but is used
that way in their codebase to generate SUDO_UID. In systems where uid_t
is signed, sudo might be also patched to NOT be unsigned and that might
be able to trigger an edge case and a bug (as described in the code), but
it is considered unlikely to happen and even if it does, the code would
just mostly fail safely, so there was no attempt either to detect it or
prevent it by the code, which is something that might change in the future,
based on expected user feedback.
Reported-by: Guy Maurel <guy.j@maurel.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Randall Becker <rsbecker@nexbridge.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally reported after release of v2.35.2 (and other maint branches)
for CVE-2022-24765 and blocking otherwise harmless commands that were
done using sudo in a repository that was owned by the user.
Add a new test script with very basic support to allow running git
commands through sudo, so a reproduction could be implemented and that
uses only `git status` as a proxy of the issue reported.
Note that because of the way sudo interacts with the system, a much
more complete integration with the test framework will require a lot
more work and that was therefore intentionally punted for now.
The current implementation requires the execution of a special cleanup
function which should always be kept as the last "test" or otherwise
the standard cleanup functions will fail because they can't remove
the root owned directories that are used. This also means that if
failures are found while running, the specifics of the failure might
not be kept for further debugging and if the test was interrupted, it
will be necessary to clean the working directory manually before
restarting by running:
$ sudo rm -rf trash\ directory.t0034-root-safe-directory/
The test file also uses at least one initial "setup" test that creates
a parallel execution directory under the "root" sub directory, which
should be used as top level directory for all repositories that are
used in this test file. Unlike all other tests the repository provided
by the test framework should go unused.
Special care should be taken when invoking commands through sudo, since
the environment is otherwise independent from what the test framework
setup and might have changed the values for HOME, SHELL and dropped
several relevant environment variables for your test. Indeed `git status`
was used as a proxy because it doesn't even require commits in the
repository to work and usually doesn't require much from the environment
to run, but a future patch will add calls to `git init` and that will
fail to honor the default branch name, unless that setting is NOT
provided through an environment variable (which means even a CI run
could fail that test if enabled incorrectly).
A new SUDO prerequisite is provided that does some sanity checking
to make sure the sudo command that will be used allows for passwordless
execution as root without restrictions and doesn't mess with git's
execution path. This matches what is provided by the macOS agents that
are used as part of GitHub actions and probably nowhere else.
Most of those characteristics make this test mostly only suitable for
CI, but it might be executed locally if special care is taken to provide
for all of them in the local configuration and maybe making use of the
sudo credential cache by first invoking sudo, entering your password if
needed, and then invoking the test with:
$ GIT_TEST_ALLOW_SUDO=YES ./t0034-root-safe-directory.sh
If it fails to run, then it means your local setup wouldn't work for the
test because of the configuration sudo has or other system settings, and
things that might help are to comment out sudo's secure_path config, and
make sure that the account you are using has no restrictions on the
commands it can run through sudo, just like is provided for the user in
the CI.
For example (assuming a username of marta for you) something probably
similar to the following entry in your /etc/sudoers (or equivalent) file:
marta ALL=(ALL:ALL) NOPASSWD: ALL
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.32:
Git 2.32.1
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.31:
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
One CI task based on Fedora image noticed a not-quite-kosher
consturct recently, which has been corrected.
* vd/pthread-setspecific-g11-fix:
async_die_is_recursing: work around GCC v11.x issue on Fedora
This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
build, first appearing after the introduction of a new version of the Fedora
docker image version. This image includes a version of `glibc` with the
attribute `__attr_access_none` added to `pthread_setspecific` [1], the
implementation of which only exists for GCC 11.X - the version included in
the Fedora image. The attribute requires that the pointer provided in the
second argument of `pthread_getspecific` must, if not NULL, be a pointer to
a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
valid, causing the error.
This fix imitates a workaround added in SELinux [2] by using the pointer to
the static `async_die_counter` itself as the second argument to
`pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
intent of the current usage while not triggering the build error.
[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
[2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
baf8ec8d3a (rebase -r: don't write .git/MERGE_MSG when
fast-forwarding, 2021-08-20) stopped reading the author script in
run_git_commit() when rewording a commit. This is normally safe
because "git commit --amend" preserves the authorship. However if the
user passes "--committer-date-is-author-date" then we need to read the
author date from the author script when rewording. Fix this regression
by tightening the check for when it is safe to skip reading the author
script.
Reported-by: Jonas Kittner <jonas.kittner@ruhr-uni-bochum.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already note that we may produce invalid output when we skip calling
iconv() altogether. But we may also do so if iconv() fails, and we have
no good alternative. Let's document this to avoid surprising users.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit fd680bc5 (logmsg_reencode(): warn when iconv()
fails, 2021-08-27). Throwing a warning for each and every commit
that gets reencoded, without allowing a way to squelch, would make
it unpleasant for folks who have to deal with an ancient part of the
history in an old project that used wrong encoding in the commits.
Protocol v0 clients can get stuck parsing a malformed feature line.
* ah/connect-parse-feature-v0-fix:
connect: also update offset for features without values
"make clean" has been updated to remove leftover .depend/
directories, even when it is not told to use them to compute header
dependencies.
* ab/make-clean-depend-dirs:
Makefile: clean .depend dirs under COMPUTE_HEADER_DEPENDENCIES != yes
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.
* jk/http-redact-fix:
http: match headers case-insensitively when redacting
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).
* cb/cvsserver:
Documentation: cleanup git-cvsserver
git-cvsserver: protect against NULL in crypt(3)
git-cvsserver: use crypt correctly to compare password hashes
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.
* jk/clone-unborn-head-in-bare:
clone: handle unborn branch in bare repos
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.
* en/stash-df-fix:
stash: restore untracked files AFTER restoring tracked files
stash: avoid feeding directories to update-index
t3903: document a pair of directory/file bugs
When "git am --abort" fails to abort correctly, it still exited
with exit status of 0, which has been corrected.
* en/am-abort-fix:
am: fix incorrect exit status on am fail to abort
t4151: add a few am --abort tests
git-am.txt: clarify --abort behavior
"git update-ref --stdin" failed to flush its output as needed,
which potentially led the conversation to a deadlock.
* ps/update-ref-batch-flush:
t1400: avoid SIGPIPE race condition on fifo
update-ref: fix streaming of status updates
The "mode" word is useless in a call to open(2) that does not
create a new file. Such a call in the files backend of the ref
subsystem has been cleaned up.
* rs/no-mode-to-open-when-appending:
refs/files-backend: remove unused open mode parameter
The order in which various files that make up a single (conceptual)
packfile has been reevaluated and straightened up. This matters in
correctness, as an incomplete set of files must not be shown to a
running Git.
* tb/pack-finalize-ordering:
pack-objects: rename .idx files into place after .bitmap files
pack-write: split up finish_tmp_packfile() function
builtin/index-pack.c: move `.idx` files into place last
index-pack: refactor renaming in final()
builtin/repack.c: move `.idx` files into place last
pack-write.c: rename `.idx` files after `*.rev`
pack-write: refactor renaming in finish_tmp_packfile()
bulk-checkin.c: store checksum directly
pack.h: line-wrap the definition of finish_tmp_packfile()
The code that optionally creates the *.rev reverse index file has
been optimized to avoid needless computation when it is not writing
the file out.
* ab/reverse-midx-optim:
pack-write: skip *.rev work when not writing *.rev
The "git apply -3" code path learned not to bother the lower level
merge machinery when the three-way merge can be trivially resolved
without the content level merge.
* jc/trivial-threeway-binary-merge:
apply: resolve trivial merge without hitting ll-merge with "--3way"
Doc update plus improved error reporting.
* jk/log-warn-on-bogus-encoding:
docs: use "character encoding" to refer to commit-object encoding
logmsg_reencode(): warn when iconv() fails
The output from "git fast-export", when its anonymization feature
is in use, showed an annotated tag incorrectly.
* tk/fast-export-anonymized-tag-fix:
fast-export: fix anonymized tag using original length
Even when running "git send-email" without its own threaded
discussion support, a threading related header in one message is
carried over to the subsequent message to result in an unwanted
threading, which has been corrected.
* mh/send-email-reset-in-reply-to:
send-email: avoid incorrect header propagation
Buggy tests could damage repositories outside the throw-away test
area we created. We now by default export GIT_CEILING_DIRECTORIES
to limit the damage from such a stray test.
* sg/set-ceiling-during-tests:
test-lib: set GIT_CEILING_DIRECTORIES to protect the surrounding repository
The sparse-index support can corrupt the index structure by storing
a stale and/or uninitialized data, which has been corrected.
* jh/sparse-index-resize-fix:
sparse-index: copy dir_hash in ensure_full_index()
"git upload-pack" which runs on the other side of "git fetch"
forgot to take the ref namespaces into account when handling
want-ref requests.
* ka/want-ref-in-namespace:
docs: clarify the interaction of transfer.hideRefs and namespaces
upload-pack.c: treat want-ref relative to namespace
t5730: introduce fetch command helper
Build update for Apple clang.
* cb/makefile-apple-clang:
build: catch clang that identifies itself as "$VENDOR clang"
build: clang version may not be followed by extra words
build: update detect-compiler for newer Xcode version
"git branch -D <branch>" used to refuse to remove a broken branch
ref that points at a missing commit, which has been corrected.
* rs/branch-allow-deleting-dangling:
branch: allow deleting dangling branches with --force
The delayed checkout code path in "git checkout" etc. were chatty
even when --quiet and/or --no-progress options were given.
* mt/quiet-with-delayed-checkout:
checkout: make delayed checkout respect --quiet and --no-progress
"git diff --relative" segfaulted and/or produced incorrect result
when there are unmerged paths.
* dd/diff-files-unmerged-fix:
diff-lib: ignore paths that are outside $cwd if --relative asked
mmap() imitation used to call xmalloc() that dies upon malloc()
failure, which has been corrected to just return an error to the
caller to be handled.
* rs/git-mmap-uses-malloc:
compat: let git_mmap use malloc(3) directly
Various bugs in "git rebase -r" have been fixed.
* pw/rebase-r-fixes:
rebase -r: fix merge -c with a merge strategy
rebase -r: don't write .git/MERGE_MSG when fast-forwarding
rebase -i: add another reword test
rebase -r: make 'merge -c' behave like reword
Checking out all the paths from HEAD during the last conflicted
step in "git rebase" and continuing would cause the step to be
skipped (which is expected), but leaves MERGE_MSG file behind in
$GIT_DIR and confuses the next "git commit", which has been
corrected.
* pw/rebase-skip-final-fix:
rebase --continue: remove .git/MERGE_MSG
rebase --apply: restore some tests
t3403: fix commit authorship
Use upload-artifacts v1 (instead of v2) for 32-bit linux, as the
new version has a blocker bug for that architecture.
* cb/ci-use-upload-artifacts-v1:
ci: use upload-artifacts v1 for dockerized jobs
"git commit --fixup" now works with "--edit" again, after it was
broken in v2.32.
* jk/commit-edit-fixup-fix:
commit: restore --edit when combined with --fixup
"git range-diff" code clean-up.
* jk/range-diff-fixes:
range-diff: use ssize_t for parsed "len" in read_patches()
range-diff: handle unterminated lines in read_patches()
range-diff: drop useless "offset" variable from read_patches()
"git apply" miscounted the bytes and failed to read to the end of
binary hunks.
* jk/apply-binary-hunk-parsing-fix:
apply: keep buffer/size pair in sync when parsing binary hunks
"git pull" had various corner cases that were not well thought out
around its --rebase backend, e.g. "git pull --ff-only" did not stop
but went ahead and rebased when the history on other side is not a
descendant of our history. The series tries to fix them up.
* en/pull-conflicting-options:
pull: fix handling of multiple heads
pull: update docs & code for option compatibility with rebasing
pull: abort by default when fast-forwarding is not possible
pull: make --rebase and --no-rebase override pull.ff=only
pull: since --ff-only overrides, handle it first
pull: abort if --ff-only is given and fast-forwarding is impossible
t7601: add tests of interactions with multiple merge heads and config
t7601: test interaction of merge/rebase/fast-forward flags and options
Bugfix for common ancestor negotiation recently introduced in "git
push" codepath.
* jt/push-negotiation-fixes:
fetch: die on invalid --negotiation-tip hash
send-pack: fix push nego. when remote has refs
send-pack: fix push.negotiate with remote helper
Input validation of "git pack-objects --stdin-packs" has been
corrected.
* ab/pack-stdin-packs-fix:
pack-objects: fix segfault in --stdin-packs option
pack-objects tests: cover blindspots in stdin handling
"git maintenance" scheduler fix for macOS.
* js/maintenance-launchctl-fix:
maintenance: skip bootout/bootstrap when plist is registered
maintenance: create `launchctl` configuration using a lock file
Update to the command line completion (in contrib/) for tcsh.
* ti/tcsh-completion-regression-fix:
completion: tcsh: Fix regression by drop of wrapper functions
Command line completion updates.
* fc/completion-updates:
completion: bash: add correct suffix in variables
completion: bash: fix for multiple dash commands
completion: bash: fix for suboptions with value
completion: bash: fix prefix detection in branch.*
Documentation updates.
* en/merge-strategy-docs:
Update error message and code comment
merge-strategies.txt: add coverage of the `ort` merge strategy
git-rebase.txt: correct out-of-date and misleading text about renames
merge-strategies.txt: fix simple capitalization error
merge-strategies.txt: avoid giving special preference to patience algorithm
merge-strategies.txt: do not imply using copy detection is desired
merge-strategies.txt: update wording for the resolve strategy
Documentation: edit awkward references to `git merge-recursive`
directory-rename-detection.txt: small updates due to merge-ort optimizations
git-rebase.txt: correct antiquated claims about --rebase-merges
parse_feature_value() takes an offset, and uses it to seek past the
point in features_list that we've already seen. However if the feature
being searched for does not specify a value, the offset is not
updated. Therefore if we call parse_feature_value() in a loop on a
value-less feature, we'll keep on parsing the same feature over and over
again. This usually isn't an issue: there's no point in using
next_server_feature_value() to search for repeated instances of the same
capability unless that capability typically specifies a value - but a
broken server could send a response that omits the value for a feature
even when we are expecting a value.
Therefore we add an offset update calculation for the no-value case,
which helps ensure that loops using next_server_feature_value() will
always terminate.
next_server_feature_value(), and the offset calculation, were first
added in 2.28 in 2c6a403d96 (connect: add function to parse multiple
v1 capability values, 2020-05-25).
Thanks to Peff for authoring the test.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The difftool dir-diff mode handles symlinks by replacing them with their
readlink(2) values. This allows diff tools to see changes to symlinks
as if they were regular text diffs with the old and new path values.
This is analogous to what "git diff" displays when symlinks change.
The temporary diff directories that are created initially contain
symlinks because they get checked-out using a temporary index that
retains the original symlinks as checked-in to the repository.
A bug was introduced when difftool was rewritten in C that made
difftool write the readlink(2) contents into the pointed-to file rather
than the symlink itself. The write was going through the symlink and
writing to its target rather than writing to the symlink path itself.
Replace symlinks with raw text files by unlinking the symlink path
before writing the readlink(2) content into them.
When 18ec800512 (difftool: handle modified symlinks in dir-diff mode,
2017-03-15) added handling for modified symlinks this bug got recorded
in the test suite. The tests included the pointed-to symlink target
paths. These paths were being reported because difftool was erroneously
writing to them, but they should have never been reported nor written.
Correct the modified-symlinks test cases by removing the target files
from the expected output.
Add a test to ensure that symlinks are written with the readlink(2)
values and that the target files contain their original content.
Reported-by: Alan Blotz <work@blotz.org>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When HTTP/2 is in use, we fail to correctly redact "Authorization" (and
other) headers in our GIT_TRACE_CURL output.
We get the headers in our CURLOPT_DEBUGFUNCTION callback, curl_trace().
It passes them along to curl_dump_header(), which in turn checks
redact_sensitive_header(). We see the headers as a text buffer like:
Host: ...
Authorization: Basic ...
After breaking it into lines, we match each header using skip_prefix().
This is case-sensitive, even though HTTP headers are case-insensitive.
This has worked reliably in the past because these headers are generated
by curl itself, which is predictable in what it sends.
But when HTTP/2 is in use, instead we get a lower-case "authorization:"
header, and we fail to match it. The fix is simple: we should match with
skip_iprefix().
Testing is more complicated, though. We do have a test for the redacting
feature, but we don't hit the problem case because our test Apache setup
does not understand HTTP/2. You can reproduce the issue by applying this
on top of the test change in this patch:
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index afa91e38b0..19267c7107 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -29,6 +29,9 @@ ErrorLog error.log
LoadModule setenvif_module modules/mod_setenvif.so
</IfModule>
+LoadModule http2_module modules/mod_http2.so
+Protocols h2c
+
<IfVersion < 2.4>
LockFile accept.lock
</IfVersion>
@@ -64,8 +67,8 @@ LockFile accept.lock
<IfModule !mod_access_compat.c>
LoadModule access_compat_module modules/mod_access_compat.so
</IfModule>
-<IfModule !mod_mpm_prefork.c>
- LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+<IfModule !mod_mpm_event.c>
+ LoadModule mpm_event_module modules/mod_mpm_event.so
</IfModule>
<IfModule !mod_unixd.c>
LoadModule unixd_module modules/mod_unixd.so
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 1c2a444ae7..ff74f0ae8a 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -24,6 +24,10 @@ test_expect_success 'create http-accessible bare repository' '
git push public main:main
'
+test_expect_success 'prefer http/2' '
+ git config --global http.version HTTP/2
+'
+
setup_askpass_helper
test_expect_success 'clone http repository' '
but this has a few issues:
- it's not necessarily portable. The http2 apache module might not be
available on all systems. Further, the http2 module isn't compatible
with the prefork mpm, so we have to switch to something else. But we
don't necessarily know what's available. It would be nice if we
could have conditional config, but IfModule only tells us if a
module is already loaded, not whether it is available at all.
This might be a non-issue. The http tests are already optional, and
modern-enough systems may just have both of these. But...
- if we do this, then we'd no longer be testing HTTP/1.1 at all. I'm
not sure how much that matters since it's all handled by curl under
the hood, but I'd worry that some detail leaks through. We'd
probably want two scripts running similar tests, one with HTTP/2 and
one with HTTP/1.1.
- speaking of which, a later test fails with the patch above! The
problem is that it is making sure we used a chunked
transfer-encoding by looking for that header in the trace. But
HTTP/2 doesn't support that, as it has its own streaming mechanisms
(the overall operation works fine; we just don't see the header in
the trace).
Furthermore, even with the changes above, this test still does not
detect the current failure, because we see _both_ HTTP/1.1 and HTTP/2
requests, which confuse it. Quoting only the interesting bits from the
resulting trace file, we first see:
=> Send header: GET /auth/smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1
=> Send header: Connection: Upgrade, HTTP2-Settings
=> Send header: Upgrade: h2c
=> Send header: HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
<= Recv header: HTTP/1.1 401 Unauthorized
<= Recv header: Date: Wed, 22 Sep 2021 20:03:32 GMT
<= Recv header: Server: Apache/2.4.49 (Debian)
<= Recv header: WWW-Authenticate: Basic realm="git-auth"
So the client asks for HTTP/2, but Apache does not do the upgrade for
the 401 response. Then the client repeats with credentials:
=> Send header: GET /auth/smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1
=> Send header: Authorization: Basic <redacted>
=> Send header: Connection: Upgrade, HTTP2-Settings
=> Send header: Upgrade: h2c
=> Send header: HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
<= Recv header: HTTP/1.1 101 Switching Protocols
<= Recv header: Upgrade: h2c
<= Recv header: Connection: Upgrade
<= Recv header: HTTP/2 200
<= Recv header: content-type: application/x-git-upload-pack-advertisement
So the client does properly redact there, because we're speaking
HTTP/1.1, and the server indicates it can do the upgrade. And then the
client will make further requests using HTTP/2:
=> Send header: POST /auth/smart/repo.git/git-upload-pack HTTP/2
=> Send header: authorization: Basic dXNlckBob3N0OnBhc3NAaG9zdA==
=> Send header: content-type: application/x-git-upload-pack-request
And there we can see that the credential is _not_ redacted. This part of
the test is what gets confused:
# Ensure that there is no "Basic" followed by a base64 string, but that
# the auth details are redacted
! grep "Authorization: Basic [0-9a-zA-Z+/]" trace &&
grep "Authorization: Basic <redacted>" trace
The first grep does not match the un-redacted HTTP/2 header, because
it insists on an uppercase "A". And the second one does find the
HTTP/1.1 header. So as far as the test is concerned, everything is OK,
but it failed to notice the un-redacted lines.
We can make this test (and the other related ones) more robust by adding
"-i" to grep case-insensitively. This isn't really doing anything for
now, since we're not actually speaking HTTP/2, but it future-proofs the
tests for a day when we do (either we add explicit HTTP/2 test support,
or it's eventually enabled by default by our Apache+curl test setup).
And it doesn't hurt in the meantime for the tests to be more careful.
The change to use "grep -i", coupled with the changes to use HTTP/2
shown above, causes the test to fail with the current code, and pass
after this patch is applied.
And finally, there's one other way to demonstrate the issue (and how I
actually found it originally). Looking at GIT_TRACE_CURL output against
github.com, you'll see the unredacted output, even if you didn't set
http.version. That's because setting it is only necessary for curl to
send the extra headers in its HTTP/1.1 request that say "Hey, I speak
HTTP/2; upgrade if you do, too". But for a production site speaking
https, the server advertises via ALPN, a TLS extension, that it supports
HTTP/2, and the client can immediately start using it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a logic error in dfea575017 (Makefile: lazily compute header
dependencies, 2010-01-26) where we'd make whether we cleaned the
.depend dirs contingent on the currently configured
COMPUTE_HEADER_DEPENDENCIES value. Before this running e.g.:
make COMPUTE_HEADER_DEPENDENCIES=yes grep.o
make COMPUTE_HEADER_DEPENDENCIES=no clean
Would leave behind the .depend directory, now it'll be removed.
Normally we'd need to use another variable, but in this case there's
no other uses of $(dep_dirs), as opposed to $(dep_args) which is used
as an argument to $(CC). So just deleting this line makes everything
work correctly.
See http://lore.kernel.org/git/xmqqmto48ufz.fsf@gitster.g for a report
about this issue.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning a repository with an unborn HEAD, we'll set the local HEAD
to match it only if the local repository is non-bare. This is
inconsistent with all other combinations:
remote HEAD | local repo | local HEAD
-----------------------------------------------
points to commit | non-bare | same as remote
points to commit | bare | same as remote
unborn | non-bare | same as remote
unborn | bare | local default
So I don't think this is some clever or subtle behavior, but just a bug
in 4f37d45706 (clone: respect remote unborn HEAD, 2021-02-05). And it's
easy to see how we ended up there. Before that commit, the code to set
up the HEAD for an empty repo was guarded by "if (!option_bare)". That's
because the only thing it did was call install_branch_config(), and we
don't want to do so for a bare repository (unborn HEAD or not).
That commit put the handling of unborn HEADs into the same block, since
those also need to call install_branch_config(). But the unborn case has
an additional side effect of calling create_symref(), and we want that
to happen whether we are bare or not.
This patch just pulls all of the "figure out the default branch" code
out of the "!option_bare" block. Only the actual config installation is
kept there.
Note that this does mean we might allocate "ref" and not use it (if the
remote is empty but did not advertise an unborn HEAD). But that's not
really a big deal since this isn't a hot code path, and it keeps the
code simple. The alternative would be handling unborn_head_target
separately, but that gets confusing since its memory ownership is
tangled up with the "ref" variable.
There's just one new test, for the case we're fixing. The other ones in
the table are handled elsewhere (the unborn non-bare case just above,
and the actually-born cases in t5601, t5606, and t5609, as they do not
require v2's "unborn" protocol extension).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Not sure what happened, but the comment is describing code elsewhere in
the file. Fix the comment to actually discuss the code that follows.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a few typos and alignment issues, and while at it update the
example hashes to show most of the ones available in recent crypt(3).
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some versions of crypt(3) will return NULL when passed an unsupported
hash type (ex: OpenBSD with DES), so check for undef instead of using
it directly.
Also use this to probe the system and select a better hash function in
the tests, so it can pass successfully.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
[jc: <CAPUEspjqD5zy8TLuFA96usU7FYi=0wF84y7NgOVFqegtxL9zbw@mail.gmail.com>]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
c057bad370 (git-cvsserver: use a password file cvsserver pserver,
2010-05-15) adds a way for `git cvsserver` to provide authenticated
pserver accounts without having clear text passwords, but uses the
username instead of the password to the call for crypt(3).
Correct that, and make sure the documentation correctly indicates how
to obtain hashed passwords that could be used to populate this
configuration, as well as correcting the hash that was used for the
tests.
This change will require that any user of this feature updates the
hashes in their configuration, but has the advantage of using a more
similar format than cvs uses, probably also easying any migration.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9af0b8dbe2 (t0000-basic: more commit-tree tests., 2006-04-26) adds
tests for commit-tree that mask the return exit from git as described
in a378fee5b0 (Documentation: add shell guidelines, 2018-10-05).
Fix the tests, to avoid pipes by using a temporary file instead.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b8ba412bf7 (tree-diff: avoid alloca for large allocations, 2016-06-07)
adds a way to route some bigger allocations out of the stack and free
them through the addition of two conveniently named macros, but leaves
the calls to free the xalloca part, which could be also in the heap,
if the system doesn't HAVE_ALLOCA_H (ex: macOS and other BSD).
Add the missing free call, xalloca_free(), which is a noop if we
allocated memory in the stack frame, but a real free() if we
allocated in the heap instead, and while at it, change the expression
to match in both macros for ease of readability.
This avoids a leak reported by LSAN while running t0000 but that
wouldn't fail the test (which is fixed in the next patch):
SUMMARY: LeakSanitizer: 1034 byte(s) leaked in 15 allocation(s).
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Time complexities for pack_pos_to_midx and midx_to_pack_pos are swapped,
correct it.
Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1400.190 sometimes fails or even hangs because of the way it uses
fifos. Our goal is to interactively read and write lines from
update-ref, so we have two fifos, in and out. We open a descriptor
connected to "in" and redirect output to that, so that update-ref does
not see EOF as it would if we opened and closed it for each "echo" call.
But we don't do the same for the output. This leads to a race where our
"read response <out" has not yet opened the fifo, but update-ref tries
to write to it and gets SIGPIPE. This can result in the test failing, or
worse, hanging as we wait forever for somebody to write to the pipe.
This is the same proble we fixed in 4783e7ea83 (t0008: avoid SIGPIPE
race condition on fifo, 2013-07-12), and we can fix it the same way, by
opening a second long-running descriptor.
Before this patch, running:
./t1400-update-ref.sh --run=1,190 --stress
failed or hung within a few dozen iterations. After it, I ran it for
several hundred without problems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git version' is probably the least complex git command,
it is a non-experimental user-facing builtin command. As such
it should have a help page.
Both `git help` and `git version` can be called as options
(`--help`/`--version`) that internally get converted to the
corresponding command. Add a small paragraph to
Documentation/git.txt describing how these two options
interact with each other and link to this help page for the
sub-options that `--version` can take. Well, currently there
is only one sub-option, but that could potentially increase
in future versions of Git.
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git help` command gained the ability to list config variables in
3ac68a93fd (help: add --config to list all available config, 2018-05-26)
but failed to tell readers of the config documenation itself.
Provide that cross reference.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We converted argv_array (which later became strvec) to use size_t in
819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in
order to avoid the possibility of integer overflow. But later, commit
d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally
converted these back to ints!
Those two commits were part of the same patch series. I'm pretty sure
what happened is that they were originally written in the opposite order
and then cleaned up and re-ordered during an interactive rebase. And
when resolving the inevitable conflict, I mistakenly took the "rename"
patch completely, accidentally dropping the type change.
We can correct it now; better late than never.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15) the variables core_compression_level and
core_compression_seen are only set, but never read. Remove them.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both Johannes and I assumed (perhaps due to familiarity with rebase)
that am --abort would return the user to a clean state. However, since
am, unlike rebase, is intended to be used within a dirty working tree,
--abort will only clean the files involved in the am operation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user deletes a file and places a directory of untracked files
there, then stashes all these changes, the untracked directory of files
cannot be restored until after the corresponding file in the way is
removed. So, restore changes to tracked files before restoring
untracked files.
There is no counterpart problem to worry about with the user deleting an
untracked file and then add a tracked one in its place. Git does not
track untracked files, and so will not know the untracked file was
deleted, and thus won't be able to stash the removal of that file.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file is removed from the cache, but there is a file of the same
name present in the working directory, we would normally treat that file
in the working directory as untracked. However, in the case of stash,
doing that would prevent a simple 'git stash push', because the untracked
file would be in the way of restoring the deleted file.
git stash, however, blindly assumes that whatever is in the working
directory for a deleted file is wanted and passes that path along to
update-index. That causes problems when the working directory contains
a directory with the same name as the deleted file. Add some code for
this special case that will avoid passing directory names to
update-index.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three tests here, because the second bug is documented with
two tests: a file -> directory change and a directory -> file change.
The reason for the two tests is just to verify that both are indeed
broken but that both will be fixed by the same simple change (which will
be provided in a subsequent patch).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preceding commits the race of renaming .idx files in place before
.rev files and other auxiliary files was fixed in pack-write.c's
finish_tmp_packfile(), builtin/repack.c's "struct exts", and
builtin/index-pack.c's final(). As noted in the change to pack-write.c
we left in place the issue of writing *.bitmap files after the *.idx,
let's fix that issue.
See 7cc8f97108 (pack-objects: implement bitmap writing, 2013-12-21)
for commentary at the time when *.bitmap was implemented about how
those files are written out, nothing in that commit contradicts what's
being done here.
Note that this commit and preceding ones only close any race condition
with *.idx files being written before their auxiliary files if we're
optimistic about our lack of fsync()-ing in this are not tripping us
over. See the thread at [1] for a rabbit hole of various discussions
about filesystem races in the face of doing and not doing fsync() (and
if doing fsync(), not doing it properly).
We may want to fsync the containing directory once after renaming the
*.idx file into place, but that is outside the scope of this series.
1. https://lore.kernel.org/git/8735qgkvv1.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the finish_tmp_packfile() function and use the split-up version
in pack-objects.c in preparation for moving the step of renaming the
*.idx file later as part of a function change.
Since the only other caller of finish_tmp_packfile() was in
bulk-checkin.c, and it won't be needing a change to its *.idx renaming,
provide a thin wrapper for the old function as a static function in that
file. If other callers end up needing the simpler version it could be
moved back to "pack-write.c" and "pack.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as preceding patches to `git repack` and `git
pack-objects`, fix the identical problem in `git index-pack`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the renaming in final() into a helper function, this is
similar in spirit to a preceding refactoring of finish_tmp_packfile()
in pack-write.c.
Before e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25) it probably wasn't worth it to have this sort of helper,
due to the differing "else if" case for "pack" files v.s. "idx" files.
But since we've got "rev" as well now, let's do the renaming via a
helper, this is both a net decrease in lines, and improves the
readability, since we can easily see at a glance that the logic for
writing these three types of files is exactly the same, aside from the
obviously differing cases of "*final_name" being NULL, and
"make_read_only_if_same" being different.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as the previous patch, fix the identical problem
from `git repack` (which invokes `pack-objects` with a temporary
location for output, and then moves the files into their final locations
itself).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We treat the presence of an `.idx` file as the indicator of whether or
not it's safe to use a packfile. But `finish_tmp_packfile()` (which is
responsible for writing the pack and moving the temporary versions of
all of its auxiliary files into place) is inconsistent about the write
order.
Specifically, it moves the `.rev` file into place after the `.idx`,
leaving open the possibility to open a pack which looks "ready" (because
the `.idx` file exists and is readable) but appears momentarily to not
have a `.rev` file. This causes Git to fall back to generating the
pack's reverse index in memory.
Though racy, this amounts to an unnecessary slow-down at worst, and
doesn't affect the correctness of the resulting reverse index.
Close this race by moving the .rev file into place before moving the
.idx file into place.
This still leaves the issue of `.idx` files being renamed into place
before the auxiliary `.bitmap` file is renamed when in pack-object.c's
write_pack_file() "write_bitmap_index" is true. That race will be
addressed in subsequent commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the renaming in finish_tmp_packfile() into a helper function.
The callers are now expected to pass a "name_buffer" ending in
"pack-OID." instead of the previous "pack-", we then append "pack",
"idx" or "rev" to it.
By doing the strbuf_setlen() in rename_tmp_packfile() we reuse the
buffer and avoid the repeated allocations we'd get if that function had
its own temporary "struct strbuf".
This approach of reusing the buffer does make the last user in
pack-object.c's write_pack_file() slightly awkward, since we needlessly
do a strbuf_setlen() before calling strbuf_release() for consistency. In
subsequent changes we'll move that bitmap writing code around, so let's
not skip the strbuf_setlen() now.
The previous strbuf_reset() idiom originated with 5889271114
(finish_tmp_packfile():use strbuf for pathname construction,
2014-03-03), which in turn was a minimal adjustment of pre-strbuf code
added in 0e990530ae (finish_tmp_packfile(): a helper function,
2011-10-28).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
finish_bulk_checkin() stores the checksum from finalize_hashfile() by
writing to the `hash` member of `struct object_id`, but that hash has
nothing to do with an object id (it's just a convenient location that
happens to be sized correctly).
Store the hash directly in an unsigned char array. This behaves the same
as writing to the `hash` member, but makes the intent clearer (and
avoids allocating an extra four bytes for the `algo` member of `struct
object_id`). It likewise prevents the possibility of a segfault when
reading `algo` (e.g., by calling `oid_to_hex()`) if it is uninitialized.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The t5562 script occasionally takes 60 extra seconds to complete due to
a race condition in the invoke-with-content-length.pl helper.
The way it's supposed to work is this:
- we set up a SIGCLD handler
- we kick off http-backend and write to it with a set content-length,
but _don't_ close the pipe
- we sleep for 60 seconds, assuming that SIGCLD from http-backend
finishing will interrupt us
- after the sleep finishes (whetherby 60 seconds or because it was
interrupted by the signal), we check a flag to see if our SIGCLD
handler was called. If not, then we complain.
This usually completes immediately, because the signal interrupts our
sleep. But very occasionally the child process dies _before_ we hit the
sleep, so we don't realize it. The test still completes successfully
(because our $exited flag is set), but it takes an extra 60 seconds.
There's no way to check the flag and sleep atomically. So the best we
can do with this approach is to sleep in smaller chunks (say, 1 second)
and check the flag incrementally. Then we waste a maximum of 1 second if
we lose the race. This was proposed in:
https://lore.kernel.org/git/20190218205028.32486-1-max@max630.net/
and it does work. But we can do better.
Instead of blocking on sleep and waiting for the child signal to
interrupt us, we can block on the child exiting and set an alarm signal
to trigger the timeout.
This lets us exit the script immediately when the child behaves (with no
race possible), and wait a maximum of 60 seconds when it doesn't.
Note one small subtlety: perl is very willing to restart the waitpid()
call after the alarm is delivered, even if we've thrown an exception via
die. "perldoc -f alarm" claims you can get around this with an eval/die
combo (and even has some example code), but it doesn't seem to work for
me with waitpid(); instead, we continue waiting until the child exits.
So instead, we'll instruct the child process to exit in the alarm
handler itself. In the original code this was done by calling
close($out). That would continue to work, since our child is always
http-backend, which should exit when its stdin closes. But we can be
even more robust against a hung or confused child by sending a KILL
signal, which should terminate it immediately.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only need to provide a mode if we are willing to let open(2) create
the file, which is not the case here, so drop the unnecessary parameter.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the catch-all error message with specific ones for opening and
duplicating by calling the wrappers xopen and xdup. The code becomes
easier to follow when error handling is reduced to two letters.
Remove the unnecessary mode parameter while at it -- we expect /dev/null
to already exist.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Line-wrap the definition of finish_tmp_packfile() to make subsequent
changes easier to read. See 0e990530ae (finish_tmp_packfile(): a
helper function, 2011-10-28) for the commit that introduced this
overly long line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a performance regression introduced in a587b5a786 (pack-write.c:
extract 'write_rev_file_order', 2021-03-30) and stop needlessly
allocating the "pack_order" array and sorting it with
"pack_order_cmp()", only to throw that work away when we discover that
we're not writing *.rev files after all.
This redundant work was not present in the original version of this
code added in 8ef50d9958 (pack-write.c: prepare to write 'pack-*.rev'
files, 2021-01-25). There we'd call write_rev_file() from
e.g. finish_tmp_packfile(), but we'd "return NULL" early in
write_rev_file() if not doing a "WRITE_REV" or "WRITE_REV_VERIFY".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back when a1be47e4 (hash-object: fix buffer reuse with --path in a
subdirectory, 2017-03-20) was written, the prefix_filename() helper
used a static piece of memory to the caller, making the caller
responsible for copying it, if it wants to keep it across another
call to the same function. Two callers of the prefix_filename() in
hash-object were made to xstrdup() the value obtained from it.
But in the same series, when e4da43b1 (prefix_filename: return newly
allocated string, 2017-03-20) changed the rule to gave the caller
possession of the memory, we forgot to revert one of the xstrdup()
changes, allowing the returned value to leak.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git bugreport writes bug report to the current directory by default,
instead of repository root.
Fix the documentation.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This script was added in f28ac70f48 (Move all dashed-form commands to
libexecdir, 2007-11-28) when commands such as "git-add" lived in the
bin directory, instead of the git exec directory.
This notice helped someone incorrectly installing version v1.6.0 and
later into a directory built for a pre-v1.6.0 git version.
We're now long past the point where anyone who'd be helped by this
warning is likely to be doing that, so let's just remove this check
and warning to simplify the Makefile.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a regression in my c95e3a3f0b (send-email: move trivial config
handling to Perl, 2021-05-28) where we'd pick the first config key out
of multiple defined ones, instead of using the normal "last key wins"
semantics of "git config --get".
This broke e.g. cases where a .git/config would have a different
sendemail.smtpServer than ~/.gitconfig. We'd pick the ~/.gitconfig
over .git/config, instead of preferring the repository-local
version. The same would go for /etc/gitconfig etc.
The full list of impacted config keys (the %config_settings values
which are references to scalars, not arrays) is:
sendemail.smtpencryption
sendemail.smtpserver
sendemail.smtpserverport
sendemail.smtpuser
sendemail.smtppass
sendemail.smtpdomain
sendemail.smtpauth
sendemail.smtpbatchsize
sendemail.smtprelogindelay
sendemail.tocmd
sendemail.cccmd
sendemail.aliasfiletype
sendemail.envelopesender
sendemail.confirm
sendemail.from
sendemail.assume8bitencoding
sendemail.composeencoding
sendemail.transferencoding
sendemail.sendmailcmd
I.e. having any of these set in say ~/.gitconfig and in-repo
.git/config regressed in v2.33.0 to prefer the --global one over the
--local.
To test this add a test of config priority to one of these config
variables, most don't have tests at all, but there was an existing one
for sendemail.8bitEncoding.
The "git config" (instead of "test_config") is somewhat of an
anti-pattern, but follows established conventions in
t9001-send-email.sh, likewise with any other pattern or idiom in this
test.
The populating of home/.gitconfig and setting of HOME= is copied from
a test in t0017-env-helper.sh added in 1ff750b128 (tests: make
GIT_TEST_GETTEXT_POISON a boolean, 2019-06-21). This test fails
without this bugfix, but now it works.
Reported-by: Eli Schwartz <eschwartz@archlinux.org>
Tested-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
output() reuses the same struct diff_options for multiple calls of
diff_flush(). Set the option no_free to instruct it to keep the
ignore regexes between calls and release them explicitly at the end.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes 19b2517f (diff-merges: move specific diff-index "-m"
handling to diff-index, 2021-05-21).
That commit disabled handling of all diff for merges options in
diff-index on an assumption that they are unused. However, it later
appeared that -c and --cc, even though undocumented and not being
covered by tests, happen to have had particular effect on diff-index
output.
Restore original -c/--cc options handling by diff-index.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ll_binary_merge() function assumes that the ancestor blob is
different from either side of the new versions, and always fails
the merge in conflict, unless -Xours or -Xtheirs is in effect.
The normal "merge" machineries all resolve the trivial cases
(e.g. if our side changed while their side did not, the result
is ours) without triggering the file-level merge drivers, so the
assumption is warranted.
The code path in "git apply --3way", however, does not check for
the trivial three-way merge situation and always calls the
file-level merge drivers. This used to be perfectly OK back
when we always first attempted a straight patch application and
used the three-way code path only as a fallback. Any binary
patch that can be applied as a trivial three-way merge (e.g. the
patch is based exactly on the version we happen to have) would
always cleanly apply, so the ll_binary_merge() that is not
prepared to see the trivial case would not have to handle such a
case.
This no longer is true after we made "--3way" to mean "first try
three-way and then fall back to straight application", and made
"git apply -3" on a binary patch that is based on the current
version no longer apply.
Teach "git apply -3" to first check for the trivial merge cases
and resolve them without hitting the file-level merge drivers.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
[jc: stolen tests from Jerry's patch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing git-update-ref(1) with the `--stdin` flag, then the user
can queue updates and, since e48cf33b61 (update-ref: implement
interactive transaction handling, 2020-04-02), interactively drive the
transaction's state via a set of transactional verbs. This interactivity
is somewhat broken though: while the caller can use these verbs to drive
the transaction's state, the status messages which confirm that a verb
has been processed is not flushed. The caller may thus be left hanging
waiting for the acknowledgement.
Fix the bug by flushing stdout after writing the status update. Add a
test which catches this bug.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In make_remote(), we store the return value of hashmap_put() and check
it using assert(), but don't otherwise use it. If Git is compiled with
NDEBUG, then the assert() becomes a noop, and nobody looks at the
variable at all. This causes some compilers to produce warnings.
Let's switch it instead to a BUG(). This accomplishes the same thing,
but is always compiled in (and we don't have to worry about the cost;
the check is cheap, and this is not a hot code path).
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the trailing dot from the warning we emit about gc.log. It's
common for various terminal UX's to allow the user to select "words",
and by including the trailing dot a user wanting to select the path to
gc.log will need to manually remove the trailing dot.
Such a user would also probably need to adjust the path if it e.g. had
spaces in it, but this should address this very common case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Suggested-by: Jan Judas <snugar.i@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expand the section about namespaces in the documentation of
`transfer.hideRefs` to point out the subtle differences between
`upload-pack` and `receive-pack`.
ffcfb68176 (upload-pack.c: treat want-ref relative to namespace,
2021-07-30) taught `upload-pack` to reject `want-ref`s for hidden refs,
which is now mentioned. It is clarified that at no point the name of a
hidden ref is revealed, but the object id it points to may.
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'upload-pack' runs within the context of a git namespace, treat any
'want-ref' lines the client sends as relative to that namespace.
Also check if the wanted ref is hidden via 'hideRefs'. If it is hidden,
respond with an error as if the ref didn't exist.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Assembling a "raw" fetch command to be fed directly to "test-tool serve-v2"
is extracted into a test helper.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the free_mailmap_entry() code added in 0925ce4d49 (Add map_user()
and clear_mailmap() to mailmap, 2009-02-08) the intent was clearly to
clear the "me" structure, but while we freed parts of the
mailmap_entry structure, we didn't free the structure itself. The same
goes for the "mailmap_info" structure.
This brings the number of SANITIZE=leak failures in t4203-mailmap.sh
down from 50 to 49. Not really progress as far as the number of
failures is concerned, but as far as I can tell this fixes all leaks
in mailmap.c itself. There's still users of it such as builtin/log.c
that call read_mailmap() without a clear_mailmap(), but that's on
them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7f40759496 (fast-export: tighten anonymize_mem() interface to
handle only strings, 2020-06-23) changed the interface used in anonymizing
strings, but failed to update the size of annotated tag messages to match
the new anonymized string.
As a result, exporting tags having messages longer than 13 characters
would create output that couldn't be parsed by fast-import,
as the data length indicated was larger than the data output.
Reset the message size when anonymizing, and add a tag with a "long"
message to the test.
Signed-off-by: Tal Kelrich <hasturkun@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in a2ba162cda (object-info: support for retrieving
object info, 2021-04-20) which appears to have been based on a
misunderstanding of how the pkt-line.c API works. There is no need to
strdup() input to packet_writer_write(), it's just a printf()-like
format function.
This fixes a potentially large memory leak, since the number of OID
lines the "object-info" call can be arbitrarily large (or a small one
if the request is small).
This makes t5701-git-serve.sh pass again under SANITIZE=leak, as it
did before a2ba162cda.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Bruno Albuquerque <bga@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If multiple independent patches are sent with send-email, even if the
"In-Reply-To" and "References" headers are not managed by --thread or
--in-reply-to, their values may be propagated from prior patches to
subsequent patches with no such headers defined.
To mitigate this and potential future issues, make sure all global
patch-specific variables are always either handled by
command-specific code (e.g. threading), or are reset to their default
values for every iteration.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Marvin Häuser <mhaeuser@posteo.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every once in a while a test somehow manages to escape from its trash
directory and modifies the surrounding repository, whether because of
a bug in git itself, a bug in a test [1], or e.g. when trying to run
tests with a shell that is, in general, unable to run our tests [2].
Set GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/.." as an additional
safety measure to protect the surrounding repository at least from
modifications by git commands executed in the tests (assuming that
handling of ceiling directories during repository discovery is not
broken, and, of course, it won't save us from regular shell commands,
e.g. 'cd .. && rm -f ...').
[1] e.g. https://public-inbox.org/git/20210423051255.GD2947267@szeder.dev
[2] $ git symbolic-ref HEAD
refs/heads/master
$ ksh ./t2011-checkout-invalid-head.sh
[... a lot of "not ok" ...]
$ git symbolic-ref HEAD
refs/heads/other
(In short: 'ksh' doesn't support the 'local' builtin command,
which is used by 'test_oid', causing it to return with error
whenever it's called, leaving ZERO_OID set to empty, so when the
test 'checkout main from invalid HEAD' runs 'echo $ZERO_OID
>.git/HEAD' it writes a corrupt (not invalid) HEAD, and subsequent
git commands don't recognize the repository in the trash directory
anymore, but operate on the surrounding repo.)
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix syntax and correct the format of printf in MyFirstObjectWalk.txt
Signed-off-by: Zoker <kaixuanguiqu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Copy the 'index_state->dir_hash' back to the real istate after expanding
a sparse index.
A crash was observed in 'git status' during some hashmap lookups with
corrupted hashmap entries. During an index expansion, new cache-entries
are added to the 'index_state->name_hash' and the 'dir_hash' in a
temporary 'index_state' variable 'full'. However, only the 'name_hash'
hashmap from this temp variable was copied back into the real 'istate'
variable. The original copy of the 'dir_hash' was incorrectly
preserved. If the table in the 'full->dir_hash' hashmap were realloced,
the stale version (in 'istate') would be corrupted.
The test suite does not operate on index sizes sufficiently large to
trigger this reallocation, so they do not cover this behavior.
Increasing the test suite to cover such scale is fragile and likely
wasteful.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git branch only allows deleting branches that point to valid commits.
Skip that check if --force is given, as the caller is indicating with
it that they know what they are doing and accept the consequences.
This allows deleting dangling branches, which previously had to be
reset to a valid start-point using --force first.
Reported-by: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the struct object_id on instead of just its hash member.
This is simpler and avoids the need to guess the algorithm.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only one of the callers of rev_is_head() provides two hashes to compare.
Move that check there and convert it to struct object_id.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "encoding" can mean a lot of things (e.g., base64 or
quoted-printable encoding in emails, HTML entities, URL encoding, and so
on). The documentation for i18n.commitEncoding and i18n.logOutputEncoding
uses the phrase "character encoding" to make this more clear.
Let's use that phrase in other places to make it clear what kind of
encoding we are talking about. This patch covers the gui.encoding
option, as well as the --encoding option for git-log, etc (in this
latter case, I word-smithed the sentence a little at the same time).
That, coupled with the mention of iconv in the --encoding description,
should make this more clear.
The other spot I looked at is the working-tree-encoding section of
gitattributes(5). But it gives specific examples of encodings that I
think make the meaning pretty clear already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user asks for a pretty-printed commit to be converted (either
explicitly with --encoding=foo, or implicitly because the commit is
non-utf8 and we want to convert it), we pass it through iconv(). If that
fails, we fall back to showing the input verbatim, but don't tell the
user that the output may be bogus.
Let's add a warning to do so, along with a mention in the documentation
for --encoding. Two things to note about the implementation:
- we could produce the warning closer to the call to iconv() in
reencode_string_len(), which would let us relay the value of errno.
But this is not actually very helpful. reencode_string_len() does
not know we are operating on a commit, and indeed does not know that
the caller won't produce an error of its own. And the errno values
from iconv() are seldom helpful (iconv_open() only ever produces
EINVAL; perhaps EILSEQ from iconv() might be illuminating, but it
can also return EINVAL for incomplete sequences).
- if the reason for the failure is that the output charset is not
supported, then the user will see this warning for every commit we
try to display. That might be ugly and overwhelming, but on the
other hand it is making it clear that every one of them has not been
converted (and the likely outcome anyway is to re-try the command
with a supported output encoding).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'Filtering contents...' progress report from delayed checkout is
displayed even when checkout and clone are invoked with --quiet or
--no-progress. Furthermore, it is displayed unconditionally, without
first checking whether stdout is a tty. Let's fix these issues and also
add some regression tests for the two code paths that currently use
delayed checkout: unpack_trees.c:check_updates() and
builtin/checkout.c:checkout_worktree().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git column's '--nl' option can be used to specify a "string to be
printed at the end of each line" (quoting the man page), but this
option and its mandatory argument has been parsed as OPT_INTEGER since
the introduction of the command in 7e29b8254f (Add column layout
skeleton and git-column, 2012-04-21). Consequently, any non-number
argument is rejected by parse-options, and any number other than 0
leads to segfault:
$ printf "%s\n" one two |git column --mode=plain --nl=foo
error: option `nl' expects a numerical value
$ printf "%s\n" one two |git column --mode=plain --nl=42
Segmentation fault (core dumped)
$ printf "%s\n" one two |git column --mode=plain --nl=0
one
two
Parse this option as OPT_STRING.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for using xopen() instead of calling
open(2) and die() or die_errno() explicitly. This makes the error
messages more consistent and shortens the code.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the flags O_CREAT and O_EXCL are both given then open(2) is supposed
to create the file and error out if it already exists. The error
message in that case looks like this:
fatal: could not open 'foo' for writing: File exists
Without further context this is confusing: Why should the existence of
the file pose a problem? Isn't that a requirement for writing to it?
Add a more specific error message for that case to tell the user that we
actually don't expect the file to preexist, so the example becomes:
fatal: unable to create 'foo': File exists
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For diff family commands, we can tell them to exclude changes outside
of some directories if --relative is requested.
In diff_unmerge(), NULL will be returned if the requested path is
outside of the interesting directories, thus we'll run into NULL
pointer dereference in run_diff_files when trying to dereference
its return value.
Checking for return value of diff_unmerge before dereferencing
is not sufficient, though. Since, diff engine will try to work on such
pathspec later.
Let's not run diff on those unintesting entries, instead.
As a side effect, by skipping like that, we can save some CPU cycles.
Reported-by: Thomas De Zeeuw <thomas@slight.dev>
Tested-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In test_atom(), we're piping the output of cat-file to tail(1),
thus, losing its exit status.
Let's use a temporary file to preserve git exit status code.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t6300, some tests are guarded behind some prerequisites.
Thus, objects created by those tests ain't available if those
prerequisites are unsatistified. Attempting to run "cat-file"
on those objects will run into failure.
In fact, running t6300 in an environment without gpg(1),
we'll see those warnings:
fatal: Not a valid object name refs/tags/signed-empty
fatal: Not a valid object name refs/tags/signed-short
fatal: Not a valid object name refs/tags/signed-long
Let's put those commands into the real tests, in order to:
* skip their execution if prerequisites aren't satistified.
* check their exit status code
The expected value for objects with type: commit needs to be
computed outside the test because we can't rely on "$3" there.
Furthermore, to prevent the accidental usage of that computed
expected value, BUG out on unknown object's type.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0696232390 (pack-redundant: fix crash when one packfile in repo,
2020-12-16) added one some new tests to t5323. At the time, the sub-repo
we used was called "master". But in a parallel branch, this was switched
to "main".
When the latter branch was merged in 27d7c8599b (Merge branch
'js/default-branch-name-tests-final-stretch', 2021-01-25), some of those
spots caused textual conflicts, but some (for tests that were far enough
away from other changed code) were just semantic. The merge resolution
fixed up most spots, but missed this one.
Even though this did impact actual code, it turned out not to fail the
tests. Running 'cd "$master_repo"' ended up staying in the same
directory, running the test in the main trash repo instead of the
sub-repo. But because the point of the test is checking behavior when
there are no packfiles, it worked in either repo (since both are empty
at this point in the script).
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The die() routine adds a "fatal: " prefix, there is no reason to add
another one. Fixes code added in e65123a71d (builtin rebase: support
`git rebase <upstream> <switch-to>`, 2018-09-04).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set packet_trace_identity() for ls-remote. This replaces the generic
"git" identity in GIT_TRACE_PACKET=<file> traces to "ls-remote", e.g.:
[...] packet: upload-pack> version 2
[...] packet: upload-pack> agent=git/2.32.0-dev
[...] packet: ls-remote< version 2
[...] packet: ls-remote< agent=git/2.32.0-dev
Where in an "git ls-remote file://<path>" dialog ">" is the sender (or
"to the server") and "<" is the recipient (or "received by the
client").
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xmalloc() dies on error, allows zero-sized allocations and enforces
GIT_ALLOC_LIMIT for testing. Our mmap replacement doesn't need any of
that. Let's cut out the wrapper, reject zero-sized requests as required
by POSIX and use malloc(3) directly. Allocation errors were needlessly
handled by git_mmap() before; this code becomes reachable now.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On macOS, we use launchctl to manage the background maintenance
schedule. This uses a set of .plist files to describe the schedule, but
these files are also registered with 'launchctl bootstrap'. If multiple
'git maintenance start' commands run concurrently, then they can collide
replacing these schedule files and registering them with launchctl.
To avoid extra launchctl commands, do a check for the .plist files on
disk and check if they are registered using 'launchctl list <name>'.
This command will return with exit code 0 if it exists, or exit code 113
if it does not.
We can test this behavior using the GIT_TEST_MAINT_SCHEDULER environment
variable.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When two `git maintenance` processes try to write the `.plist` file, we
need to help them with serializing their efforts.
The 150ms time-out value was determined from thin air.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, $(pwd) returns a drive-letter style path C:/foo, while $PWD
contains a POSIX style /c/foo path. When we want to interpolate the
current directory in the PATH variable, we must not use the C:/foo style,
because the meaning of the colon is ambiguous. Use the POSIX style.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable D is never defined in test t5582, more severely the test
fails if D is defined by something outside the test suite, so remove
this spurious line.
Signed-off-by: Mickey Endito <mickey.endito.2323@protonmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a rebase is started with a --strategy option other than "ort" or
"recursive" then "merge -c" does not allow the user to reword the
commit message.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fast-forwarding we do not create a new commit so .git/MERGE_MSG
is not removed and can end up seeding the message of a commit made
after the rebase has finished. Avoid writing .git/MERGE_MSG when we
are fast-forwarding by writing the file after the fast-forward
checks. Note that there are no changes to the fast-forward code, it is
simply moved.
Note that the way this change is implemented means we no longer write
the author script when fast-forwarding either. I believe this is safe
for the reasons below but it is a departure from what we do when
fast-forwarding a non-merge commit. If we reword the merge then 'git
commit --amend' will keep the authorship of the commit we're rewording
as it ignores GIT_AUTHOR_* unless --reset-author is passed. It will
also export the correct GIT_AUTHOR_* variables to any hooks and we
already test the authorship of the reworded commit. If we are not
rewording then we no longer call spilt_ident() which means we are no
longer checking the commit author header looks sane. However this is
what we already do when fast-forwarding non-merge commits in
skip_unnecessary_picks() so I don't think we're breaking any promises
by not checking the author here.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of the existing reword tests check that there are no uncommitted
changes when the editor is opened. Reuse the editor script from the
last commit to fix this omission.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user runs git log while rewording a commit it is confusing if
sometimes we're amending the commit that's being reworded and at other
times we're creating a new commit depending on whether we could
fast-forward or not[1]. For this reason the reword command ensures
that there are no uncommitted changes when rewording. The reword
command also allows the user to edit the todo list while the rebase is
paused. As 'merge -c' also rewords commits make it behave like reword
and add a test.
[1] https://lore.kernel.org/git/xmqqlfvu4be3.fsf@gitster-ct.c.googlers.com/T/#m133009cb91cf0917bcf667300f061178be56680a
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rules creating the $(LIB_FILE) and $(XDIFF_LIB) archives used to
be:
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
until commit 7b76d6bf22 (Makefile: add and use the ".DELETE_ON_ERROR"
flag, 2021-06-29) removed the '$(RM) $@' part, claiming that "we can
rely on the "c" (create) being present in ARFLAGS", and (I presume)
assuming that it means that the named archive is created from scratch.
Unfortunately, that's not what the 'c' flag does, it merely "Suppress
the diagnostic message that is written to standard error by default
when the archive is created" [1]. Consequently, all object files that
are already present in an existing archive and are not replaced will
remain there. This leads to linker errors in back-to-back builds of
different revisions without a 'make clean' between them if source
files going into these archives are renamed in between:
# The last commit renaming files that go into 'libgit.a':
# bc62692757 (hash-lookup: rename from sha1-lookup, 2020-12-31)
# sha1-lookup.c => hash-lookup.c | 14 +++++++-------
# sha1-lookup.h => hash-lookup.h | 12 ++++++------
$ git checkout bc62692757^
HEAD is now at 7a7d992d0d sha1-lookup: rename `sha1_pos()` as `hash_pos()`
$ make
[...]
$ git checkout 7b76d6bf22
HEAD is now at 7b76d6bf22 Makefile: add and use the ".DELETE_ON_ERROR" flag
$ make
[...]
AR libgit.a
LINK git
/usr/bin/ld: libgit.a(hash-lookup.o): in function `bsearch_hash':
/home/szeder/src/git/hash-lookup.c:105: multiple definition of `bsearch_hash'; libgit.a(sha1-lookup.o):/home/szeder/src/git/sha1-lookup.c:105: first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:2213: git] Error 1
Restore the original make rules to first remove $(LIB_FILE) and
$(XDIFF_LIB) and then create them from scratch to avoid these build
errors.
[1] Quoting POSIX at:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cleanup of old compat wrappers in bash completion caused a
regression on tcsh completion that still uses them.
Let's update the tcsh call site as well for addressing it.
Fixes: 441ecdab37 ("completion: bash: remove old compat wrappers")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
__gitcomp automatically adds a suffix, but __gitcomp_nl and others
don't, we need to specify a space by default.
Can be tested with:
git config branch.autoSetupMe<tab>
This fix only works for versions of bash greater than 4.0, before that
"local sfx" creates an empty string, therefore the unset expansion
doesn't work. The same happens in zsh.
Therefore we don't add the test for that for now.
The correct fix for all shells requires semantic changes in __gitcomp,
but that can be done later.
Cc: SZEDER Gábor <szeder.dev@gmail.com>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to ignore options that don't start with -- as well.
Depending on the value of COMP_WORDBREAKS the last word could be
duplicated otherwise.
Can be tested with:
git merge -X diff-algorithm=<tab>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise we are completely ignoring the --cur argument.
The issue can be tested with:
git clone --config=branch.<tab>
Reviewed-by: SZEDER Gábor <szeder.dev@gmail.com>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codepath to access recently added oidtree data structure had
to make unaligned accesses to oids, which has been corrected.
* rs/oidtree-alignment-fix:
oidtree: avoid unaligned access to crit-bit tree
The flexible array member "k" of struct cb_node is used to store the key
of the crit-bit tree node. It offers no alignment guarantees -- in fact
the current struct layout puts it one byte after a 4-byte aligned
address, i.e. guaranteed to be misaligned.
oidtree uses a struct object_id as cb_node key. Since cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) it requires 4-byte
alignment. The mismatch is reported by UndefinedBehaviorSanitizer at
runtime like this:
hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment
0x00015000802d: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in
We can fix that by:
1. eliminating the alignment requirement of struct object_id,
2. providing the alignment in struct cb_node, or
3. avoiding the issue by only using memcpy to access "k".
Currently we only store one of two values in "algo" in struct object_id.
We could use a uint8_t for that instead and widen it only once we add
support for our twohundredth algorithm or so. That would not only avoid
alignment issues, but also reduce the memory requirements for each
instance of struct object_id by ca. 9%.
Supporting keys with alignment requirements might be useful to spread
the use of crit-bit trees. It can be achieved by using a wider type for
"k" (e.g. uintmax_t), using different types for the members "byte" and
"otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use
of flexible arrays like khash.h does.
This patch implements the third option, though, because it has the least
potential for causing side-effects and we're close to the next release.
If one of the other options is implemented later as well to get their
additional benefits we can get rid of the extra copies introduced here.
Reported-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
e9f79acb28 (ci: upgrade to using actions/{up,down}load-artifacts v2,
2021-06-23) changed all calls to that action from v1 to v2, but there
is still an open bug[1] that affects all nodejs actions and prevents
its use in 32-bit linux (as used by the Linux32 container)
move all dockerized jobs to use v1 that was built in C# and therefore
doesn't have this problem, which will otherwise manifest with confusing
messages like:
/usr/bin/docker exec 0285adacc4536b7cd962079c46f85fa05a71e66d7905b5e4b9b1a0e8b305722a sh -c "cat /etc/*release | grep ^ID"
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: no such file or directory: unknown
[1] https://github.com/actions/runner/issues/1011
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent changes to --fixup, adding amend suboption, caused the
--edit flag to be ignored as use_editor was always set to zero.
Restore edit_flag having higher priority than fixup_message when
deciding the value of use_editor by moving the edit flag condition
later in the method.
Signed-off-by: Joel Klinghed <the_jk@spawned.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Translate 48 new messages (5230t0f0u) for git 2.33.0, and also fixed
typos found by "git-po-helper".
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Format README.md using GFM (GitHub Flavored Markdown) syntax.
- In order to use more than 3 level headings, use ATX style headings
instead of setext style headings.
- In order to add highlights for code blocks, use fenced code blocks
instead of indented code blocks.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Update translation for following component:
* builtin/submodule--helper.c
Translate following new component:
* builtin/revert.c
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
* 'master' of github.com:git/git: (51 commits)
Git 2.33-rc2
object-file: use unsigned arithmetic with bit mask
Revert 'diff-merges: let "-m" imply "-p"'
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
Git 2.33-rc1
test: fix for COLUMNS and bash 5
The eighth batch
diff: --pickaxe-all typofix
mingw: align symlinks-related rmdir() behavior with Linux
t7508: avoid non POSIX BRE
use fspathhash() everywhere
t0001: fix broken not-quite getcwd(3) test in bed67874e2
Documentation: render special characters correctly
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
...
If the user skips the final commit by removing all the changes from
the index and worktree with 'git restore' (or read-tree) and then runs
'git rebase --continue' .git/MERGE_MSG is left behind. This will seed
the commit message the next time the user commits which is not what we
want to happen.
Reported-by: Victor Gambier <vgambier@excilys.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
980b482d28 ("rebase tests: mark tests specific to the am-backend with
--am", 2020-02-15) sought to prepare tests testing the "apply" backend
in preparation for 2ac0d6273f ("rebase: change the default backend
from "am" to "merge"", 2020-02-15). However some tests seem to have
been missed leading to us testing the "merge" backend twice. This
patch fixes some cases that I noticed while adding tests to these
files, I have not audited all the other rebase test files. I've
reworded a couple of the test descriptions to make it clear which
backend they are testing.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting GIT_AUTHOR_* when committing with --amend will only change the
author if we also pass --reset-author. This commit is used in some
tests that ensure the author ident does not change when rebasing.
Creating this commit without changing the authorship meant that the
test would not catch regressions that caused rebase to discard the
original authorship information.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
make sure it uses a supported OS branch and uses all the resources
that can be allocated efficiently.
while only 1GB of memory is needed, 2GB is the minimum for a 2 CPU
machine (the default), but by increasing parallelism wall time has
been reduced by 35%.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin reported possible typos in po/id.po, all of them are mismatch
variable names. Fix them.
Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Earlier "git log -m" was changed to always produce patch output,
which would break existing scripts, which has been reverted.
* jn/log-m-does-not-imply-p:
Revert 'diff-merges: let "-m" imply "-p"'
Build fix.
* cb/many-alternate-optim-fixup:
object-file: use unsigned arithmetic with bit mask
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
similar to the recently added sparse task, it is nice to know as early
as possible.
add a dockerized build using fedora (that usually has the latest gcc)
to be ahead of the curve and avoid older ISO C issues at the same time.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
33f379eee6 (make object_directory.loose_objects_subdir_seen a bitmap,
2021-07-07) replaced a wasteful 256-byte array with a 32-byte array
and bit operations. The mask calculation shifts a literal 1 of type
int left by anything between 0 and 31. UndefinedBehaviorSanitizer
doesn't like that and reports:
object-file.c:2477:18: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Make sure to use an unsigned 1 instead to avoid the issue.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is useful for performance monitoring and debugging purposes to know
the wire protocol used for remote operations. This may differ from the
version set in local configuration due to differences in version and/or
configuration between the server and the client. Therefore, log the
negotiated wire protocol version via trace2, for both clients and
servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We parse through binary hunks by looping through the buffer with code
like:
llen = linelen(buffer, size);
...do something with the line...
buffer += llen;
size -= llen;
However, before we enter the loop, there is one call that increments
"buffer" but forgets to decrement "size". As a result, our "size" is off
by the length of that line, and subsequent calls to linelen() may look
past the end of the buffer for a newline.
The fix is easy: we just need to decrement size as we do elsewhere.
This bug goes all the way back to 0660626caf (binary diff: further
updates., 2006-05-05). Presumably nobody noticed because it only
triggers if the patch is corrupted, and even then we are often "saved"
by luck. We use a strbuf to store the incoming patch, so we overallocate
there, plus we add a 16-byte run of NULs as slop for memory comparisons.
So if this happened accidentally, the common case is that we'd just read
a few uninitialized bytes from the end of the strbuf before producing
the expected "this patch is corrupted" error complaint.
However, it is possible to carefully construct a case which reads off
the end of the buffer. The included test does so. It will pass both
before and after this patch when run normally, but using a tool like
ASan shows that we get an out-of-bounds read before this patch, but not
after.
Reported-by: Xingman Chen <xichixingman@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we iterate through the buffer containing git-log output, parsing
lines, we use an "int" to store the size of an individual line. This
should be a size_t, as we have no guarantee that there is not a
malicious 2GB+ commit-message line in the output.
Overflowing this integer probably doesn't do anything _too_ terrible. We
are not using the value to size a buffer, so the worst case is probably
an out-of-bounds read from before the array. But it's easy enough to
fix.
Note that we have to use ssize_t here, since we also store the length
result from parse_git_diff_header(), which may return a negative value
for error. That function actually returns an int itself, which has a
similar overflow problem, but I'll leave that for another day. Much
of the apply.c code uses ints and should be converted as a whole; in the
meantime, a negative return from parse_git_diff_header() will be
interpreted as an error, and we'll bail (so we can't handle such a case,
but given that it's likely to be malicious anyway, the important thing
is we don't have any memory errors).
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing our buffer of output from git-log, we have a
find_end_of_line() helper that finds the next newline, and gives us the
number of bytes to move past it, or the size of the whole remaining
buffer if there is no newline.
But trying to handle both those cases leads to some oddities:
- we try to overwrite the newline with NUL in the caller, by writing
over line[len-1]. This is at best redundant, since the helper will
already have done so if it saw a newline. But if it didn't see a
newline, it's actively wrong; we'll overwrite the byte at the end of
the (unterminated) line.
We could solve this just dropping the extra NUL assignment in the
caller and just letting the helper do the right thing. But...
- if we see a "diff --git" line, we'll restore the newline on top of
the NUL byte, so we can pass the string to parse_git_diff_header().
But if there was no newline in the first place, we can't do this.
There's no place to put it (the current code writes a newline
over whatever byte we obliterated earlier). The best we can do is
feed the complete remainder of the buffer to the function (which is,
in fact, a string, by virtue of being a strbuf).
To solve this, the caller needs to know whether we actually found a
newline or not. We could modify find_end_of_line() to return that
information, but we can further observe that it has only one caller.
So let's just inline it in that caller.
Nobody seems to have noticed this case, probably because git-log would
never produce input that doesn't end with a newline. Arguably we could
just return an error as soon as we see that the output does not end in a
newline. But the code to do so actually ends up _longer_, mostly because
of the cleanup we have to do in handling the error.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "offset" variable was was introduced in 44b67cb62b (range-diff:
split lines manually, 2019-07-11), but it has never done anything
useful. We use it to count up the number of bytes we've consumed, but we
never look at the result. It was probably copied accidentally from an
almost-identical loop in apply.c:find_header() (and the point of that
commit was to make use of the parse_git_diff_header() function which
underlies both).
Because the variable was set but not used, most compilers didn't seem to
notice, but the upcoming clang-14 does complain about it, via its
-Wunused-but-set-variable warning.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f5bfcc823b, which
made "git log -m" imply "--patch" by default. The logic was that
"-m", which makes diff generation for merges perform a diff against
each parent, has no use unless I am viewing the diff, so we could save
the user some typing by turning on display of the resulting diff
automatically. That wasn't expected to adversely affect scripts
because scripts would either be using a command like "git diff-tree"
that already emits diffs by default or would be combining -m with a
diff generation option such as --name-status. By saving typing for
interactive use without adversely affecting scripts in the wild, it
would be a pure improvement.
The problem is that although diff generation options are only relevant
for the displayed diff, a script author can imagine them affecting
path limiting. For example, I might run
git log -w --format=%H -- README
hoping to list commits that edited README, excluding whitespace-only
changes. In fact, a whitespace-only change is not TREESAME so the use
of -w here has no effect (since we don't apply these diff generation
flags to the diff_options struct rev_info::pruning used for this
purpose), but the documentation suggests that it should work
Suppose you specified foo as the <paths>. We shall call
commits that modify foo !TREESAME, and the rest TREESAME. (In
a diff filtered for foo, they look different and equal,
respectively.)
and a script author who has not tested whitespace-only changes
wouldn't notice.
Similarly, a script author could include
git log -m --first-parent --format=%H -- README
to filter the first-parent history for commits that modified README.
The -m is a no-op but it reflects the script author's intent. For
example, until 1e20a407fe (stash list: stop passing "-m" to "git
log", 2021-05-21), "git stash list" did this.
As a result, we can't safely change "-m" to imply "-p" without fear of
breaking such scripts. Restore the previous behavior.
Noticed because Rust's src/bootstrap/bootstrap.py made use of this
same construct: https://github.com/rust-lang/rust/pull/87513. That
script has been updated to omit the unnecessary "-m" option, but we
can expect other scripts in the wild to have similar expectations.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d540b70c85 (merge: cleanup messages like commit, 2019-04-17) adds
a way to change part of the helper text using a single call to
strbuf_add_commented_addf but with two formats with varying number
of parameters.
this trigger a warning in old versions of Xcode (ex 8.0), so use
instead two independent calls with a matching number of parameters
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cf2dc1c238 (speed up alt_odb_usable() with many alternates, 2021-07-07)
introduces a KHASH_INIT invocation with a trailing ';', which while
commonly expected will trigger warnings with pedantic on both
clang[-Wextra-semi] and gcc[-Wpedantic], because that macro has already
a semicolon and is meant to be invoked without one.
while fixing the macro would be a worthy solution (specially considering
this is a common recurring problem), remove the extra ';' for now to
minimize churn.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
92d8ed8ac1 (oidtree: a crit-bit tree for odb_loose_cache, 2021-07-07)
adds a struct oidtree_node that contains only an n field with a
struct cb_node.
unfortunately, while building in pedantic mode witch clang 12 (as well
as a similar error from gcc 11) it will show:
oidtree.c:11:17: error: 'n' may not be nested in a struct due to flexible array member [-Werror,-Wflexible-array-extensions]
struct cb_node n;
^
because of a constrain coded in ISO C 11 6.7.2.1¶3 that forbids using
structs that contain a flexible array as part of another struct.
use a strict cb_node directly instead.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The case statement in detect-compiler notices 'clang', 'FreeBSD
clang' and 'Apple clang', but there are other platforms that follow
the '$VENDOR clang' pattern (e.g. Debian).
Generalize the pattern to catch them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_family and get_version helpers of detect-compiler assume
that the line to identify the version from the compilers have a
token "version", followed by the version number, followed by some
other string, e.g.
$ CC=gcc get_version_line
gcc version 10.2.1 20210110 (Debian 10.2.1-6)
But that is not necessarily true, e.g.
$ CC=clang get_version_line
Debian clang version 11.0.1-2
Tweak the script not to require extra string after the version.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1da1580e4c (Makefile: detect compiler and enable more warnings in
DEVELOPER=1, 2018-04-14) uses the output of the compiler banner to
detect the compiler family.
Apple had since changed the wording used to refer to its compiler
as clang instead of LLVM as shown by:
$ cc --version
Apple clang version 12.0.5 (clang-1205.0.22.9)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
so update the script to match, and allow DEVELOPER=1 to work as
expected again in macOS.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c49a177bec (test-lib.sh: set COLUMNS=80 for --verbose
repeatability, 2021-06-29) multiple tests have been failing when using
bash 5 because checkwinsize is enabled by default, therefore COLUMNS is
reset using TIOCGWINSZ even for non-interactive shells.
It's debatable whether or not bash should even be doing that, but for
now we can avoid this undesirable behavior by disabling this option.
Reported-by: Fabian Stelzer <fabian.stelzer@campoint.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
[jc: with SZEDER Gábor's suggestion to do this before setting COLUMNS]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were two locations in the code that referred to 'merge-recursive'
but which were also applicable to 'merge-ort'. Update them to more
general wording.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 58634dbff8 ("rebase: Allow merge strategies to be used when
rebasing", 2006-06-21) added the --merge option to git-rebase so that
renames could be detected (at least when using the `recursive` merge
backend). However, git-am -3 gained that same ability in commit
579c9bb198 ("Use merge-recursive in git-am -3.", 2006-12-28). As such,
the comment about being able to detect renames is not particularly
noteworthy. Remove it.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have diff-algorithm that explains why there are special diff
algorithms, so we do not need to re-explain patience. patience exists
as its own toplevel option for historical reasons, but there's no reason
to give it special preference or document it again and suggest it's more
important than other diff algorithms, so just refer to it as a
deprecated shorthand for `diff-algorithm=patience`.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stating that the recursive strategy "currently cannot make use of
detected copies" implies that this is a technical shortcoming of the
current algorithm. I disagree with that. I don't see how copies could
possibly be used in a sane fashion in a merge algorithm -- would we
propagate changes in one file on one side of history to each copy of
that file when merging? That makes no sense to me. I cannot think of
anything else that would make sense either. Change the wording to
simply state that we ignore any copies.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is probably helpful to cover the default merge strategy first, so
move the text for the resolve strategy to later in the document.
Further, the wording for "resolve" claimed that it was "considered
generally safe and fast", which might imply in some readers minds that
the same is not true of other strategies. Rather than adding this text
to all the strategies, just remove it from this one.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few places in the documentation referred to the "`recursive` strategy"
using the phrase "`git merge-recursive`", suggesting that it was forking
subprocesses to call a toplevel builtin. Perhaps that was relevant to
when rebase was a shell script, but it seems like a rather indirect way
to refer to the `recursive` strategy. Simplify the references.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0c4fd732f0 ("Move computation of dir_rename_count from
merge-ort to diffcore-rename", 2021-02-27), much of the logic for
computing directory renames moved into diffcore-rename.
directory-rename-detection.txt had claims that all of that logic was
found in merge-recursive. Update the documentation.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --rebase-merges was first introduced, it only worked with the
`recursive` strategy. Some time later, it gained support for merges
using the `octopus` strategy. The limitation of only supporting these
two strategies was documented in 25cff9f109 ("rebase -i --rebase-merges:
add a section to the man page", 2018-04-25) and lifted in e145d99347
("rebase -r: support merge strategies other than `recursive`",
2019-07-31). However, when the limitation was lifted, the documentation
was not updated. Update it now.
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Windows rmdir() equivalent behaves differently from POSIX ones in
that when used on a symbolic link that points at a directory, the
target directory gets removed, which has been corrected.
* tb/mingw-rmdir-symlink-to-directory:
mingw: align symlinks-related rmdir() behavior with Linux
The local changes stashed by "git merge --autostash" were lost when
the merge failed in certain ways, which has been corrected.
* pb/merge-autostash-more:
merge: apply autostash if merge strategy fails
merge: apply autostash if fast-forward fails
Documentation: define 'MERGE_AUTOSTASH'
merge: add missing word "strategy" to a message
Further optimization on "merge -sort" backend.
* en/ort-perf-batch-14:
merge-ort: restart merge with cached renames to reduce process entry cost
merge-ort: avoid recursing into directories when we don't need to
merge-ort: defer recursing into directories when merge base is matched
merge-ort: add a handle_deferred_entries() helper function
merge-ort: add data structures for allowable trivial directory resolves
merge-ort: add some more explanations in collect_merge_info_callback()
merge-ort: resolve paths early when we have sufficient information
Reorganize and update the SubmitingPatches document.
* ab/update-submitting-patches:
SubmittingPatches: replace discussion of Travis with GitHub Actions
SubmittingPatches: move discussion of Signed-off-by above "send"
Leak plugging.
* ah/plugleaks:
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
ref-filter: also free head for ATOM_HEAD to avoid leak
diffcore-rename: move old_dir/new_dir definition to plug leak
builtin/for-each-repo: remove unnecessary argv copy to plug leak
builtin/submodule--helper: release unused strbuf to avoid leak
environment: move strbuf into block to plug leak
fmt-merge-msg: free newly allocated temporary strings when done
Rewrite of "git submodule" in C continues.
* ar/submodule-add:
submodule: drop unused sm_name parameter from show_fetch_remotes()
submodule--helper: introduce add-clone subcommand
submodule--helper: refactor module_clone()
submodule: prefix die messages with 'fatal'
t7400: test failure to add submodule in tracked path
When I was fixing fuzzies as I updating po/id.po for 2.33.0 l10n round,
I noticed a triple-dash typo (--pickaxe-all) at diff.c, which according
to git-diff(1) manpage, the correct option name should be --pickaxe-all.
Fix the typo.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a rebase, rmdir() is called on the folder .git/logs. On
Unix rmdir() exits without deleting anything in case .git/logs is a
symbolic link but the equivalent functions on Windows (_rmdir, _wrmdir
and RemoveDirectoryW) do not behave the same and remove the folder if it
is symlinked even if it is not empty.
This creates issues when folders in .git/ are symlinks which is
especially the case when git-repo[1] is used: It replaces `.git/logs/`
with a symlink.
One such issue is that the _target_ of that symlink is removed e.g.
during a `git rebase`, where `delete_reflog("REBASE_HEAD")` will not
only try to remove `.git/logs/REBASE_HEAD` but then recursively try to
remove the parent directories until an error occurs, a technique that
obviously relies on `rmdir()` refusing to remove a symlink.
This was reported in https://github.com/git-for-windows/git/issues/2967.
This commit updates mingw_rmdir() so that its behavior is the same as
Linux rmdir() in case of symbolic links.
To verify that Git does not regress on the reported issue, this patch
adds a regression test for the `git rebase` symptom, even if the same
`rmdir()` behavior is quite likely to cause potential problems in other
Git commands as well.
[1]: git-repo is a python tool built on top of Git which helps manage
many Git repositories. It stores all the .git/ folders in a central
place by taking advantage of symbolic links.
More information: https://gerrit.googlesource.com/git-repo/
Signed-off-by: Thomas Bétous <tomspycell@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
24c30e0b6 (wt-status: tolerate dangling marks, 2020-09-01) adds a test
that uses a BRE which breaks at least with OpenBSD's grep.
switch to an ERE as it is done for similar checks and while at it, remove
the now obsolete test_i18ngrep call.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we introduced new documentation that talks
about "[commit|object] prerequsite(s)", but also faithfully moved
around existing documentation that talks about the "basis".
Let's change both that moved-around documentation and other existing
documentation in the file to consistently use "[commit|object]"
prerequisite(s)" instead of talking about "basis". The mention of
"basis" isn't wrong, but readers will be helped by us using only one
term throughout the document for this concept.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elaborate on the restriction that you cannot provide a revision that
doesn't resolve to a reference in the "SPECIFYING REFERENCES" section
with examples.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out the discussion bout "object prerequisites" into its own
section, and add some more examples of the common cases.
See 2e0afafebd (Add git-bundle: move objects and references by
archive, 2007-02-22) for the introduction of the documentation being
changed here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite the "DESCRIPTION" section for "git bundle" to start by talking
about what bundles are in general terms, rather than diving directly
into one example of what they might be used for.
This changes documentation that's been substantially the same ever
since the command was added in 2e0afafebd (Add git-bundle: move
objects and references by archive, 2007-02-22).
I've split up the DESCRIPTION into that section and a "BUNDLE FORMAT"
section, it briefly discusses the format, but then links to the
technical/bundle-format.txt documentation.
The "the user must specify a basis" part of this is discussed below in
"SPECIFYING REFERENCES", and will be further elaborated on in a
subsequent commit. So I'm removing that part and letting the mention
of "revision exclusions" suffice.
There was a discussion about whether to say anything at all about
"thin packs" here[1]. I think it's good to mention it for the curious
reader willing to read the technical docs, but let's explicitly say
that there's no "thick pack", and that the difference shouldn't
matter.
1. http://lore.kernel.org/git/xmqqk0mbt5rj.fsf@gitster.g
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A race between repacking and using pack bitmaps has been corrected.
* jk/check-pack-valid-before-opening-bitmap:
pack-bitmap: check pack validity when opening bitmap
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.
* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code
CI update.
* js/ci-check-whitespace-updates:
ci(check-whitespace): restrict to the intended commits
ci(check-whitespace): stop requiring a read/write token
Documentation around GIT_CONFIG has been updated.
* jk/config-env-doc:
doc/git-config: simplify "override" advice for FILES section
doc/git-config: clarify GIT_CONFIG environment variable
doc/git-config: explain --file instead of referring to GIT_CONFIG
cf2dc1c238 (speed up alt_odb_usable() with many alternates, 2021-07-07)
introduced the function fspathhash() for calculating path hashes while
respecting the configuration option core.ignorecase. Call it instead of
open-coding it; the resulting code is shorter and less repetitive.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With a54e938e5b (strbuf: support long paths w/o read rights in
strbuf_getcwd() on FreeBSD, 2017-03-26) we had t0001 break on systems
like OpenBSD and AIX whose getcwd(3) has standard (but not like glibc
et al) behavior.
This was partially fixed in bed67874e2 (t0001: skip test with
restrictive permissions if getpwd(3) respects them, 2017-08-07).
The problem with that fix is that while its analysis of the problem is
correct, it doesn't actually call getcwd(3), instead it invokes "pwd
-P". There is no guarantee that "pwd -P" is going to call getcwd(3),
as opposed to e.g. being a shell built-in.
On AIX under both bash and ksh this test breaks because "pwd -P" will
happily display the current working directory, but getcwd(3) called by
the "git init" we're testing here will fail to get it.
I checked whether clobbering the $PWD environment variable would
affect it, and it didn't. Presumably these shells keep track of their
working directory internally.
There's possible follow-up work here in teaching strbuf_getcwd() to
get the working directory with whatever method "pwd" uses on these
platforms. See [1] for a discussion of that, but let's take the easy
way out here and just skip these tests by fixing the
GETCWD_IGNORES_PERMS prerequisite to match the limitations of
strbuf_getcwd().
1. https://lore.kernel.org/git/b650bef5-d739-d98d-e9f1-fa292b6ce982@web.de/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three hyphens are rendered verbatim, so "--" has to be used to produce a
dash. There is no double arrow ("<->" is rendered as "<→"), so a left
and right arrow "<-->" have to be combined for that.
So fix asciidoc output for special characters. This is similar to fixes
in commit de82095a95 (doc hash-function-transition: fix asciidoc output,
2021-02-05).
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"TEST_OUTPUT_DIRECTORY=there make test" failed to work, which has
been corrected.
* ps/t0000-output-directory-fix:
t0000: fix test if run with TEST_OUTPUT_DIRECTORY
The code that gives an error message in "git multi-pack-index" when
no subcommand is given tried to print a NULL pointer as a strong,
which has been corrected.
* tb/reverse-midx:
multi-pack-index: fix potential segfault without sub-command
The completion support used to offer alternate spelling of options
that exist only for compatibility, which has been corrected.
* pb/dont-complete-aliased-options:
parse-options: don't complete option aliases by default
Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.
* en/rename-limits-doc:
rename: bump limit defaults yet again
diffcore-rename: treat a rename_limit of 0 as unlimited
doc: clarify documentation for rename/copy limits
diff: correct warning message when renameLimit exceeded
A guideline for gender neutral documentation has been added.
* ds/gender-neutral-doc-guidelines:
CodingGuidelines: recommend gender-neutral description
"git status" codepath learned to work with sparsely populated index
without hydrating it fully.
* ds/status-with-sparse-index:
t1092: document bad sparse-checkout behavior
fsmonitor: integrate with sparse index
wt-status: expand added sparse directory entries
status: use sparse-index throughout
status: skip sparse-checkout percentage with sparse-index
diff-lib: handle index diffs with sparse dirs
dir.c: accept a directory as part of cone-mode patterns
unpack-trees: unpack sparse directory entries
unpack-trees: rename unpack_nondirectories()
unpack-trees: compare sparse directories correctly
unpack-trees: preserve cache_bottom
t1092: add tests for status/add and sparse files
t1092: expand repository data shape
t1092: replace incorrect 'echo' with 'cat'
sparse-index: include EXTENDED flag when expanding
sparse-index: skip indexes with unmerged entries
The CI gained a new job to run "make sparse" check.
* js/ci-make-sparse:
ci/install-dependencies: handle "sparse" job package installs
ci: run "apt-get update" before "apt-get install"
ci: run `make sparse` as part of the GitHub workflow
Tests that cover protocol bits have been updated and helpers
used there have been consolidated.
* ab/pkt-line-tests:
test-lib-functions: use test-tool for [de]packetize()
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.
* ab/attribute-format:
advice.h: add missing __attribute__((format)) & fix usage
*.h: add a few missing __attribute__((format))
*.c static functions: add missing __attribute__((format))
sequencer.c: move static function to avoid forward decl
*.c static functions: don't forward-declare __attribute__
Optimize "git log" for cases where we wasted cycles to load ref
decoration data that may not be needed.
* jk/log-decorate-optim:
load_ref_decorations(): fix decoration with tags
add_ref_decoration(): rename s/type/deco_type/
load_ref_decorations(): avoid parsing non-tag objects
object.h: add lookup_object_by_type() function
object.h: expand docstring for lookup_unknown_object()
log: avoid loading decorations for userformats that don't need it
pretty.h: update and expand docstring for userformat_find_requirements()
"git worktree add --lock" learned to record why the worktree is
locked with a custom message.
* sm/worktree-add-lock:
worktree: teach `add` to accept --reason <string> with --lock
worktree: mark lock strings with `_()` for translation
t2400: clean up '"add" worktree with lock' test
Optimization for repositories with many alternate object store.
* ew/many-alternate-optim:
oidtree: a crit-bit tree for odb_loose_cache
oidcpy_with_padding: constify `src' arg
make object_directory.loose_objects_subdir_seen a bitmap
avoid strlen via strbuf_addstr in link_alt_odb_entry
speed up alt_odb_usable() with many alternates
"git commit --allow-empty-message" won't abort the operation upon
an empty message, but the hint shown in the editor said otherwise.
* hj/commit-allow-empty-message:
commit: remove irrelavent prompt on `--allow-empty-message`
commit: reorganise commit hint strings
This just matches the style/location of the package installation for
other jobs. There should be no functional change.
I did flip the order of the options and command-name ("-y update"
instead of "update -y") for consistency with other lines in the same
file.
Note also that we have to reorder the dependency install with the
"checkout" action, so that we actually have the "ci" scripts available.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "sparse" workflow runs "apt-get install" to pick up a few necessary
packages. But it needs to run "apt-get update" first, or it risks trying
to download an old package version that no longer exists. And in fact
this happens now, with output like:
2021-07-26T17:40:51.2551880Z E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/c/curl/libcurl4-openssl-dev_7.68.0-1ubuntu2.5_amd64.deb 404 Not Found [IP: 52.147.219.192 80]
2021-07-26T17:40:51.2554304Z E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Our other ci jobs don't suffer from this; they rely on scripts in ci/,
and ci/install-dependencies does the appropriate "apt-get update".
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup_unpack_trees_porcelain() populates various fields on
unpack_tree_opts, we need to call clear_unpack_trees_porcelain() to
avoid leaking them. Specifically, we used to leak
unpack_tree_opts.msgs_to_free.
We have to do this in leave_reset_head because there are multiple
scenarios where unpack_tree_opts has already been configured, followed
by a 'goto leave_reset_head'. But we can also 'goto leave_reset_head'
prior to having initialised unpack_tree_opts via memset(..., 0, ...).
Therefore we also move unpack_tree_opts initialisation to the start of
reset_head(), and convert it to use brace initialisation - which
guarantees that we can never clear an uninitialised unpack_tree_opts.
clear_unpack_tree_opts() is always safe to call as long as
unpack_tree_opts is at least zero-initialised, i.e. it does not depend
on a previous call to setup_unpack_trees_porcelain().
LSAN output from t0021:
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa721e5 in xrealloc wrapper.c:126:8
#2 0x9f7861 in strvec_push_nodup strvec.c:19:2
#3 0x9f7861 in strvec_pushf strvec.c:39:2
#4 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
#5 0x97e011 in reset_head reset.c:53:2
#6 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
#7 0x4ce83e in run_builtin git.c:475:11
#8 0x4ccafe in handle_builtin git.c:729:3
#9 0x4cb01c in run_argv git.c:818:4
#10 0x4cb01c in cmd_main git.c:949:19
#11 0x6b3f3d in main common-main.c:52:11
#12 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 147 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa721e5 in xrealloc wrapper.c:126:8
#2 0x9e8d54 in strbuf_grow strbuf.c:98:2
#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
#4 0x9f7774 in strvec_pushf strvec.c:36:2
#5 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
#6 0x97e011 in reset_head reset.c:53:2
#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
#8 0x4ce83e in run_builtin git.c:475:11
#9 0x4ccafe in handle_builtin git.c:729:3
#10 0x4cb01c in run_argv git.c:818:4
#11 0x4cb01c in cmd_main git.c:949:19
#12 0x6b3f3d in main common-main.c:52:11
#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 134 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa721e5 in xrealloc wrapper.c:126:8
#2 0x9e8d54 in strbuf_grow strbuf.c:98:2
#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
#4 0x9f7774 in strvec_pushf strvec.c:36:2
#5 0xa43fe4 in setup_unpack_trees_porcelain unpack-trees.c:168:3
#6 0x97e011 in reset_head reset.c:53:2
#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
#8 0x4ce83e in run_builtin git.c:475:11
#9 0x4ccafe in handle_builtin git.c:729:3
#10 0x4cb01c in run_argv git.c:818:4
#11 0x4cb01c in cmd_main git.c:949:19
#12 0x6b3f3d in main common-main.c:52:11
#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 130 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa721e5 in xrealloc wrapper.c:126:8
#2 0x9e8d54 in strbuf_grow strbuf.c:98:2
#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
#4 0x9f7774 in strvec_pushf strvec.c:36:2
#5 0xa43f20 in setup_unpack_trees_porcelain unpack-trees.c:150:3
#6 0x97e011 in reset_head reset.c:53:2
#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
#8 0x4ce83e in run_builtin git.c:475:11
#9 0x4ccafe in handle_builtin git.c:729:3
#10 0x4cb01c in run_argv git.c:818:4
#11 0x4cb01c in cmd_main git.c:949:19
#12 0x6b3f3d in main common-main.c:52:11
#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 603 byte(s) leaked in 4 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- cmd_rebase populates rebase_options.strategy with newly allocated
strings, hence we need to free those strings at the end of cmd_rebase
to avoid a leak.
- In some cases: get_replay_opts() is called, which prepares replay_opts
using data from rebase_options. We used to simply copy the pointer
from rebase_options.strategy, however that would now result in a
double-free because sequencer_remove_state() is eventually used to
free replay_opts.strategy. To avoid this we xstrdup() strategy when
adding it to replay_opts.
The original leak happens because we always populate
rebase_options.strategy, but we don't always enter the path that calls
get_replay_opts() and later sequencer_remove_state() - in other words
we'd always allocate a new string into rebase_options.strategy but
only sometimes did we free it. We now make sure that rebase_options
and replay_opts both own their own copies of strategy, and each copy
is free'd independently.
This was first seen when running t0021 with LSAN, but t2012 helped catch
the fact that we can't just free(options.strategy) at the end of
cmd_rebase (as that can cause a double-free). LSAN output from t0021:
LSAN output from t0021:
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa71eb8 in xstrdup wrapper.c:29:14
#2 0x61b1cc in cmd_rebase builtin/rebase.c:1779:22
#3 0x4ce83e in run_builtin git.c:475:11
#4 0x4ccafe in handle_builtin git.c:729:3
#5 0x4cb01c in run_argv git.c:818:4
#6 0x4cb01c in cmd_main git.c:949:19
#7 0x6b3fad in main common-main.c:52:11
#8 0x7f267b512349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge_name() calls dwim_ref(), which allocates a new string into
found_ref. Therefore add a free() to avoid leaking found_ref.
LSAN output from t0021:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa8beb8 in xstrdup wrapper.c:29:14
#2 0x954054 in expand_ref refs.c:671:12
#3 0x953cb6 in repo_dwim_ref refs.c:644:22
#4 0x5d3759 in dwim_ref refs.h:162:9
#5 0x5d3759 in merge_name builtin/merge.c:517:6
#6 0x5d3759 in collect_parents builtin/merge.c:1214:5
#7 0x5cf60d in cmd_merge builtin/merge.c:1458:16
#8 0x4ce83e in run_builtin git.c:475:11
#9 0x4ccafe in handle_builtin git.c:729:3
#10 0x4cb01c in run_argv git.c:818:4
#11 0x4cb01c in cmd_main git.c:949:19
#12 0x6bdbfd in main common-main.c:52:11
#13 0x7f0430502349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These leaks all happen at the end of cmd_mv, hence don't matter in any
way. But we still fix the easy ones and squash the rest to get us closer
to being able to run tests without leaks.
LSAN output from t0050:
Direct leak of 384 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c015 in xrealloc wrapper.c:126:8
#2 0xa0a7e1 in add_entry string-list.c:44:2
#3 0xa0a7e1 in string_list_insert string-list.c:58:14
#4 0x5dac03 in cmd_mv builtin/mv.c:248:4
#5 0x4ce83e in run_builtin git.c:475:11
#6 0x4ccafe in handle_builtin git.c:729:3
#7 0x4cb01c in run_argv git.c:818:4
#8 0x4cb01c in cmd_main git.c:949:19
#9 0x6bd9ad in main common-main.c:52:11
#10 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0xa8bd09 in do_xmalloc wrapper.c:41:8
#2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
#3 0x5da575 in cmd_mv builtin/mv.c:158:14
#4 0x4ce83e in run_builtin git.c:475:11
#5 0x4ccafe in handle_builtin git.c:729:3
#6 0x4cb01c in run_argv git.c:818:4
#7 0x4cb01c in cmd_main git.c:949:19
#8 0x6bd9ad in main common-main.c:52:11
#9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0xa8bd09 in do_xmalloc wrapper.c:41:8
#2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
#3 0x5da4e4 in cmd_mv builtin/mv.c:148:11
#4 0x4ce83e in run_builtin git.c:475:11
#5 0x4ccafe in handle_builtin git.c:729:3
#6 0x4cb01c in run_argv git.c:818:4
#7 0x4cb01c in cmd_main git.c:949:19
#8 0x6bd9ad in main common-main.c:52:11
#9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0xa8c119 in xcalloc wrapper.c:140:8
#2 0x5da585 in cmd_mv builtin/mv.c:159:22
#3 0x4ce83e in run_builtin git.c:475:11
#4 0x4ccafe in handle_builtin git.c:729:3
#5 0x4cb01c in run_argv git.c:818:4
#6 0x4cb01c in cmd_main git.c:949:19
#7 0x6bd9ad in main common-main.c:52:11
#8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0xa8c119 in xcalloc wrapper.c:140:8
#2 0x5da4f8 in cmd_mv builtin/mv.c:149:10
#3 0x4ce83e in run_builtin git.c:475:11
#4 0x4ccafe in handle_builtin git.c:729:3
#5 0x4cb01c in run_argv git.c:818:4
#6 0x4cb01c in cmd_main git.c:949:19
#7 0x6bd9ad in main common-main.c:52:11
#8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c015 in xrealloc wrapper.c:126:8
#2 0xa00226 in strbuf_grow strbuf.c:98:2
#3 0xa00226 in strbuf_vaddf strbuf.c:394:3
#4 0xa065c7 in xstrvfmt strbuf.c:981:2
#5 0xa065c7 in xstrfmt strbuf.c:991:8
#6 0x9e7ce7 in prefix_path_gently setup.c:115:15
#7 0x9e7fa6 in prefix_path setup.c:128:12
#8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
#9 0x5da575 in cmd_mv builtin/mv.c:158:14
#10 0x4ce83e in run_builtin git.c:475:11
#11 0x4ccafe in handle_builtin git.c:729:3
#12 0x4cb01c in run_argv git.c:818:4
#13 0x4cb01c in cmd_main git.c:949:19
#14 0x6bd9ad in main common-main.c:52:11
#15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c015 in xrealloc wrapper.c:126:8
#2 0xa00226 in strbuf_grow strbuf.c:98:2
#3 0xa00226 in strbuf_vaddf strbuf.c:394:3
#4 0xa065c7 in xstrvfmt strbuf.c:981:2
#5 0xa065c7 in xstrfmt strbuf.c:991:8
#6 0x9e7ce7 in prefix_path_gently setup.c:115:15
#7 0x9e7fa6 in prefix_path setup.c:128:12
#8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
#9 0x5da4e4 in cmd_mv builtin/mv.c:148:11
#10 0x4ce83e in run_builtin git.c:475:11
#11 0x4ccafe in handle_builtin git.c:729:3
#12 0x4cb01c in run_argv git.c:818:4
#13 0x4cb01c in cmd_main git.c:949:19
#14 0x6bd9ad in main common-main.c:52:11
#15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 558 byte(s) leaked in 7 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply_multi_file_filter and async_query_available_blobs both query
subprocess output using subprocess_read_status, which writes data into
the identically named filter_status strbuf. We add a strbuf_release to
avoid leaking their contents.
Leak output seen when running t0021 with LSAN:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c2b5 in xrealloc wrapper.c:126:8
#2 0x9ff99d in strbuf_grow strbuf.c:98:2
#3 0x9ff99d in strbuf_addbuf strbuf.c:304:2
#4 0xa101d6 in subprocess_read_status sub-process.c:45:5
#5 0x77793c in apply_multi_file_filter convert.c:886:8
#6 0x77793c in apply_filter convert.c:1042:10
#7 0x77a0b5 in convert_to_git_filter_fd convert.c:1492:7
#8 0x8b48cd in index_stream_convert_blob object-file.c:2156:2
#9 0x8b48cd in index_fd object-file.c:2248:9
#10 0x597411 in hash_fd builtin/hash-object.c:43:9
#11 0x596be1 in hash_object builtin/hash-object.c:59:2
#12 0x596be1 in cmd_hash_object builtin/hash-object.c:153:3
#13 0x4ce83e in run_builtin git.c:475:11
#14 0x4ccafe in handle_builtin git.c:729:3
#15 0x4cb01c in run_argv git.c:818:4
#16 0x4cb01c in cmd_main git.c:949:19
#17 0x6bdc2d in main common-main.c:52:11
#18 0x7f42acf79349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).
Direct leak of 120 byte(s) in 5 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c295 in xrealloc wrapper.c:126:8
#2 0x9ff97d in strbuf_grow strbuf.c:98:2
#3 0x9ff97d in strbuf_addbuf strbuf.c:304:2
#4 0xa101b6 in subprocess_read_status sub-process.c:45:5
#5 0x775c73 in async_query_available_blobs convert.c:960:8
#6 0x80029d in finish_delayed_checkout entry.c:183:9
#7 0xa65d1e in check_updates unpack-trees.c:493:10
#8 0xa5f469 in unpack_trees unpack-trees.c:1747:8
#9 0x525971 in checkout builtin/clone.c:815:6
#10 0x525971 in cmd_clone builtin/clone.c:1409:8
#11 0x4ce83e in run_builtin git.c:475:11
#12 0x4ccafe in handle_builtin git.c:729:3
#13 0x4cb01c in run_argv git.c:818:4
#14 0x4cb01c in cmd_main git.c:949:19
#15 0x6bdc2d in main common-main.c:52:11
#16 0x7fa253fce349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 120 byte(s) leaked in 5 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
repo_diff_setup() calls through to diff.c's static prep_parse_options(),
which in turn allocates a new array into diff_opts.parseopts.
diff_setup_done() is responsible for freeing that array, and has the
benefit of verifying diff_opts too - hence we add a call to
diff_setup_done() to avoid leaking parseopts.
Output from the leak as found while running t0090 with LSAN:
Direct leak of 7120 byte(s) in 1 object(s) allocated from:
#0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0xa8bf89 in do_xmalloc wrapper.c:41:8
#2 0x7a7bae in prep_parse_options diff.c:5636:2
#3 0x7a7bae in repo_diff_setup diff.c:4611:2
#4 0x93716c in repo_index_has_changes read-cache.c:2518:3
#5 0x872233 in unclean merge-ort-wrappers.c:12:14
#6 0x872233 in merge_ort_recursive merge-ort-wrappers.c:53:6
#7 0x5d5b11 in try_merge_strategy builtin/merge.c:752:12
#8 0x5d0b6b in cmd_merge builtin/merge.c:1666:9
#9 0x4ce83e in run_builtin git.c:475:11
#10 0x4ccafe in handle_builtin git.c:729:3
#11 0x4cb01c in run_argv git.c:818:4
#12 0x4cb01c in cmd_main git.c:949:19
#13 0x6bdc2d in main common-main.c:52:11
#14 0x7f551eb51349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 7120 byte(s) leaked in 1 allocation(s)
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
u.head is populated using resolve_refdup(), which returns a newly
allocated string - hence we also need to free() it.
Found while running t0041 with LSAN:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa8be98 in xstrdup wrapper.c:29:14
#2 0x9481db in head_atom_parser ref-filter.c:549:17
#3 0x9408c7 in parse_ref_filter_atom ref-filter.c:703:30
#4 0x9400e3 in verify_ref_format ref-filter.c:974:8
#5 0x4f9e8b in print_ref_list builtin/branch.c:439:6
#6 0x4f9e8b in cmd_branch builtin/branch.c:757:3
#7 0x4ce83e in run_builtin git.c:475:11
#8 0x4ccafe in handle_builtin git.c:729:3
#9 0x4cb01c in run_argv git.c:818:4
#10 0x4cb01c in cmd_main git.c:949:19
#11 0x6bdc2d in main common-main.c:52:11
#12 0x7f96edf86349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
old_dir/new_dir are free()'d at the end of update_dir_rename_counts,
however if we return early we'll never free those strings. Therefore
we should move all new allocations after the possible early return,
avoiding a leak.
This seems like a fairly recent leak, that started happening since the
early-return was added in:
1ad69eb0dc (diffcore-rename: compute dir_rename_counts in stages, 2021-02-27)
LSAN output from t0022:
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa71e48 in xstrdup wrapper.c:29:14
#2 0x7db9c7 in update_dir_rename_counts diffcore-rename.c:464:12
#3 0x7db6ae in find_renames diffcore-rename.c:1062:3
#4 0x7d76c3 in diffcore_rename_extended diffcore-rename.c:1472:18
#5 0x7b4cfc in diffcore_std diff.c:6705:4
#6 0x855e46 in log_tree_diff_flush log-tree.c:846:2
#7 0x856574 in log_tree_diff log-tree.c:955:3
#8 0x856574 in log_tree_commit log-tree.c:986:10
#9 0x9a9c67 in print_commit_summary sequencer.c:1329:7
#10 0x52e623 in cmd_commit builtin/commit.c:1862:3
#11 0x4ce83e in run_builtin git.c:475:11
#12 0x4ccafe in handle_builtin git.c:729:3
#13 0x4cb01c in run_argv git.c:818:4
#14 0x4cb01c in cmd_main git.c:949:19
#15 0x6b3f3d in main common-main.c:52:11
#16 0x7fe397c7a349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa71e48 in xstrdup wrapper.c:29:14
#2 0x7db9bc in update_dir_rename_counts diffcore-rename.c:463:12
#3 0x7db6ae in find_renames diffcore-rename.c:1062:3
#4 0x7d76c3 in diffcore_rename_extended diffcore-rename.c:1472:18
#5 0x7b4cfc in diffcore_std diff.c:6705:4
#6 0x855e46 in log_tree_diff_flush log-tree.c:846:2
#7 0x856574 in log_tree_diff log-tree.c:955:3
#8 0x856574 in log_tree_commit log-tree.c:986:10
#9 0x9a9c67 in print_commit_summary sequencer.c:1329:7
#10 0x52e623 in cmd_commit builtin/commit.c:1862:3
#11 0x4ce83e in run_builtin git.c:475:11
#12 0x4ccafe in handle_builtin git.c:729:3
#13 0x4cb01c in run_argv git.c:818:4
#14 0x4cb01c in cmd_main git.c:949:19
#15 0x6b3f3d in main common-main.c:52:11
#16 0x7fe397c7a349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 14 byte(s) leaked in 2 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_for_each_repo() copies argv into args (a strvec), which is later
passed into run_command_on_repo(), which in turn copies that strvec onto
the end of child.args. The initial copy is unnecessary (we never modify
args). We therefore choose to just pass argv directly into
run_command_on_repo(), which lets us avoid the copy and fixes the leak.
LSAN output from t0068:
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x7f63bd4ab8b0 in realloc (/usr/lib64/libasan.so.4+0xdc8b0)
#1 0x98d7e6 in xrealloc wrapper.c:126
#2 0x916914 in strvec_push_nodup strvec.c:19
#3 0x916a6e in strvec_push strvec.c:26
#4 0x4be4eb in cmd_for_each_repo builtin/for-each-repo.c:49
#5 0x410dcd in run_builtin git.c:475
#6 0x410dcd in handle_builtin git.c:729
#7 0x414087 in run_argv git.c:818
#8 0x414087 in cmd_main git.c:949
#9 0x40e9ec in main common-main.c:52
#10 0x7f63bc9fa349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 22 byte(s) in 2 object(s) allocated from:
#0 0x7f63bd445e30 in __interceptor_strdup (/usr/lib64/libasan.so.4+0x76e30)
#1 0x98d698 in xstrdup wrapper.c:29
#2 0x916a63 in strvec_push strvec.c:26
#3 0x4be4eb in cmd_for_each_repo builtin/for-each-repo.c:49
#4 0x410dcd in run_builtin git.c:475
#5 0x410dcd in handle_builtin git.c:729
#6 0x414087 in run_argv git.c:818
#7 0x414087 in cmd_main git.c:949
#8 0x40e9ec in main common-main.c:52
#9 0x7f63bc9fa349 in __libc_start_main (/lib64/libc.so.6+0x24349)
See also discussion about the original implementation below - this code
appears to have evolved from a callback explaining the double-strvec-copy
pattern, but there's no strong reason to keep that now:
https://lore.kernel.org/git/68bbeca5-314b-08ee-ef36-040e3f3814e9@gmail.com/
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
relative_url() populates sb. In the normal return path, its buffer is
detached using strbuf_detach(). However the early return path does
nothing with sb, which means that sb's memory is leaked - therefore
we add a release to avoid this leak.
The reset is also only necessary for the normal return path, hence we
move it down to after the early-return to avoid unnecessary work.
LSAN output from t0060:
Direct leak of 121 byte(s) in 1 object(s) allocated from:
#0 0x7f31246f28b0 in realloc (/usr/lib64/libasan.so.4+0xdc8b0)
#1 0x98d7d6 in xrealloc wrapper.c:126
#2 0x909a60 in strbuf_grow strbuf.c:98
#3 0x90bf00 in strbuf_vaddf strbuf.c:401
#4 0x90c321 in strbuf_addf strbuf.c:335
#5 0x5cb78d in relative_url builtin/submodule--helper.c:182
#6 0x5cbe46 in resolve_relative_url_test builtin/submodule--helper.c:248
#7 0x410dcd in run_builtin git.c:475
#8 0x410dcd in handle_builtin git.c:729
#9 0x414087 in run_argv git.c:818
#10 0x414087 in cmd_main git.c:949
#11 0x40e9ec in main common-main.c:52
#12 0x7f3123c41349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 121 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
realpath is only populated if we execute the git_work_tree_initialized
block. However that block also causes us to return early, meaning we
never actually release the strbuf in the case where we populated it.
Therefore we move all strbuf related code into the block to guarantee
that we can't leak it.
LSAN output from t0095:
Direct leak of 129 byte(s) in 1 object(s) allocated from:
#0 0x49a9b9 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x78f585 in xrealloc wrapper.c:126:8
#2 0x713ff4 in strbuf_grow strbuf.c:98:2
#3 0x713ff4 in strbuf_getcwd strbuf.c:597:3
#4 0x4f0c18 in strbuf_realpath_1 abspath.c:99:7
#5 0x5ae4a4 in set_git_work_tree environment.c:259:3
#6 0x6fdd8a in setup_discovered_git_dir setup.c:931:2
#7 0x6fdd8a in setup_git_directory_gently setup.c:1235:12
#8 0x4cb50d in get_bloom_filter_for_commit t/helper/test-bloom.c:41:2
#9 0x4cb50d in cmd__bloom t/helper/test-bloom.c:95:3
#10 0x4caa1f in cmd_main t/helper/test-tool.c:124:11
#11 0x4caded in main common-main.c:52:11
#12 0x7f0869f02349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 129 byte(s) leaked in 1 allocation(s).
It looks like this leak has existed since realpath was first added to
set_git_work_tree() in:
3d7747e318 (real_path: remove unsafe API, 2020-03-10)
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
origin starts off pointing to somewhere within line, which is owned by
the caller. Later we might allocate a new string using xmemdupz() or
xstrfmt(). To avoid leaking these new strings, we introduce a to_free
pointer - which allows us to safely free the newly allocated string when
we're done (we cannot just free origin directly as it might still be
pointing to line).
LSAN output from t0090:
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0xa71f49 in do_xmalloc wrapper.c:41:8
#2 0xa720b0 in do_xmallocz wrapper.c:75:8
#3 0xa720b0 in xmallocz wrapper.c:83:9
#4 0xa720b0 in xmemdupz wrapper.c:99:16
#5 0x8092ba in handle_line fmt-merge-msg.c:187:23
#6 0x8092ba in fmt_merge_msg fmt-merge-msg.c:666:7
#7 0x5ce2e6 in prepare_merge_message builtin/merge.c:1119:2
#8 0x5ce2e6 in collect_parents builtin/merge.c:1215:3
#9 0x5c9c1e in cmd_merge builtin/merge.c:1454:16
#10 0x4ce83e in run_builtin git.c:475:11
#11 0x4ccafe in handle_builtin git.c:729:3
#12 0x4cb01c in run_argv git.c:818:4
#13 0x4cb01c in cmd_main git.c:949:19
#14 0x6b3fad in main common-main.c:52:11
#15 0x7fb929620349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter has not been used since the function was introduced in
8c8195e9c3 (submodule--helper: introduce add-clone subcommand,
2021-07-10).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', once it is determined that
we have to create a merge commit, calls 'create_autostash' if
'--autostash' is given.
As explained in a03b55530a, and made more abvious by the tests added in
that commit, the autostash is then applied if the merge succeeds, either
directly or by committing (after conflict resolution or if '--no-commit'
was given), or if the merge is aborted with 'git merge --abort'. In some
other cases, like the user calling 'git reset --merge' or 'git merge
--quit', the autostash is not applied, but saved in the stash list.
However, there exists a scenario that creates an autostash but does not
apply nor save it to the stash list: if the chosen merge strategy
completely fails to handle the merge, i.e. 'try_merge_strategy' returns
2.
Apply the autostash in that case also. An easy way to test that is to
try to merge more than two commits but explicitely ask for the 'recursive'
merge strategy.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', in the fast-forward case,
calls 'create_autostash' before calling 'checkout_fast_forward' if
'--autostash' is given.
However, if 'checkout_fast_forward' fails, the autostash is not applied
to the working tree, nor saved in the stash list, since the code simply
calls 'goto done'.
Be more helpful to the user by applying the autostash in that case.
An easy way to test a failing fast-forward is when we are merging a
branch that has a tracked file that conflicts with an untracked file in
the working tree.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for 'git merge --abort' and 'git merge --quit' both
mention the special ref 'MERGE_AUTOSTASH', but this ref is not formally
defined anywhere. Mention it in the description of the '--autostash'
option for 'git merge'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable 'best_strategy' holds the name of the merge strategy that
resulted in fewer conflicts, if several strategies were tried. When
that's the case but the best strategy was not the first one tried, we
inform the user which strategy was the "best" one before recreating the
merge and leaving the conflicted files in the tree.
This informational message is missing the word "strategy", so it shows
something like:
Using the recursive to prepare resolving by hand.
Fix that.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.
The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the prefetching code in unpack-trees.c into its own function,
because it will be used elsewhere in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pack-objects adds an entry to its list of objects to pack, it may
mark the packfile and offset that contains the file, which we can later
use to output the object verbatim. If the packfile is deleted while we
are running (e.g., by another process running "git repack"), we may die
in use_pack() if the pack file cannot be opened.
We worked around this in 4c08018204 (pack-objects: protect against
disappearing packs, 2011-10-14) by making sure we can open the pack
before recording it as a source. This detects a pack which has already
disappeared while generating the packing list, and because we keep the
pack's file descriptor (or an mmap window) open, it means we can access
it later (unless you exceed core.packedgitlimit).
The bitmap code that was added later does not do this; it adds entries
to the packlist without checking that the packfile is still valid, and
is vulnerable to this race. It needs the same treatment as 4c08018204.
However, rather than add it in just that one spot, it makes more sense
to simply open and check the packfile when we open the bitmap.
Technically you can use the .bitmap without even looking in the .pack
file (e.g., if you are just printing a list of objects without accessing
them), but it's much simpler to do it early. That covers all later
direct uses of the pack (due to the cached descriptor) without having to
check each one directly. For example, in pack-objects we need to protect
the packlist entries, but we also access the pack directly as part of
the reuse_partial_pack_from_bitmap() feature. This patch covers both
cases.
There's no test here, because the problem is inherently racy. I
reproduced and verified the fix with this script:
rm -rf parent.git push.git fetch.git
push() {
(
cd push.git &&
echo content >>file &&
git add file &&
git commit -qm "change $1" &&
git push -q origin HEAD &&
echo "push $1..."
) &&
(
cd parent.git &&
git repack -ad -q &&
echo "repack $1..."
)
}
fetch() {
rm -rf fetch.git &&
git clone -q file://$PWD/parent.git fetch.git &&
echo "fetch $1..."
}
git init --bare parent.git &&
git --git-dir=parent.git config transfer.unpacklimit 1 &&
git clone parent.git push.git &&
(for i in `seq 1 1000`; do push $i || break; done) &
pusher=$!
(for i in `seq 1 1000`; do fetch $i || break; done) &
fetcher=$!
wait $fetcher
kill $pusher
That simulates a race between a client cloning and a push triggering a
repack on the server. Without this patch, it generally fails within a
couple hundred iterations with:
remote: fatal: packfile ./objects/pack/.tmp-1377349-pack-498afdec371232bdb99d1757872f5569331da61e.pack cannot be accessed
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
With this patch, it reliably runs through all thousand attempts.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.
This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.
But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.
Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.
So let's use test_cmp instead, and also in the other nearby tests
where it's easy.
This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.
I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list" learns to omit the "commit <object-name>" header
lines from the output with the `--no-commit-header` option.
* bc/rev-list-without-commit-line:
rev-list: add option for --pretty=format without header
GitHub Actions / CI update.
* js/ci-windows-update:
ci: accelerate the checkout
ci (vs-build): build with NO_GETTEXT
artifacts-tar: respect NO_GETTEXT
ci (windows): transfer also the Git-tracked files to the test jobs
ci: upgrade to using actions/{up,down}load-artifacts v2
ci (vs-build): use `cmd` to copy the DLLs, not `powershell`
ci: use the new GitHub Action to download git-sdk-64-minimal
"git send-email" optimization.
* ab/send-email-optim:
perl: nano-optimize by replacing Cwd::cwd() with Cwd::getcwd()
send-email: move trivial config handling to Perl
perl: lazily load some common Git.pm setup code
send-email: lazily load modules for a big speedup
send-email: get rid of indirect object syntax
send-email: use function syntax instead of barewords
send-email: lazily shell out to "git var"
send-email: lazily load config for a big speedup
send-email: copy "config_regxp" into git-send-email.perl
send-email: refactor sendemail.smtpencryption config parsing
send-email: remove non-working support for "sendemail.smtpssl"
send-email tests: test for boolean variables without a value
send-email tests: support GIT_TEST_PERL_FATAL_WARNINGS=true
Replace the discussion of Travis CI added in
0e5d028a7a (Documentation: add setup instructions for Travis CI,
2016-05-02) with something that covers the GitHub Actions added in
889cacb689 (ci: configure GitHub Actions for CI/PR, 2020-04-11).
The setup is trivial compared to using Travis, and it even works on
Windows (that "hopefully soon" comment was probably out-of-date on
Travis as well).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the section discussing the addition of a SOB trailer above the
section that discusses generating the patch itself. This makes sense
as we don't want someone to go through the process of "git
format-patch", only to realize late that they should have used "git
commit -s" or equivalent.
This is a move-only change, no lines here are being altered, only
moved around.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With multiple heads, we should not allow rebasing or fast-forwarding.
Make sure any fast-forward request calls out specifically the fact that
multiple branches are in play. Also, since we cannot fast-forward to
multiple branches, fix our computation of can_ff.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-pull.txt includes merge-options.txt, which is written assuming
merges will happen. git-pull has allowed rebases for many years; update
the documentation to reflect that.
While at it, pass any `--signoff` flag through to the rebase backend too
so that we don't have to document it as merge-specific. Rebase has
supported the --signoff flag for years now as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have for some time shown a long warning when the user does not
specify how to reconcile divergent branches with git pull. Make it an
error now.
Initial-patch-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the last few precedence tests failing in t7601 by now implementing
the logic to have --[no-]rebase override a pull.ff=only config setting.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are both merge and rebase branches in the logic, and previously
both had to handle fast-forwarding. Merge handled that implicitly
(because git merge handles it directly), while in rebase it was
explicit. Given that the --ff-only flag is meant to override any
--rebase or --no-rebase, make the code reflect that by handling
--ff-only before the merge-vs-rebase logic.
It turns out that this also fixes a bug for submodules. Previously,
when --ff-only was given, the code would run `merge --ff-only` on the
main module, and then run `submodule update --recursive --rebase` on the
submodules. With this change, we still run `merge --ff-only` on the
main module, but now run `submodule update --recursive --checkout` on
the submodules. I believe this better reflects the intent of --ff-only
to have it apply to both the main module and the submodules.
(Sidenote: It is somewhat interesting that all merges pass `--checkout`
to submodule update, even when `--no-ff` is specified, meaning that it
will only do fast-forward merges for submodules. This was discussed in
commit a6d7eb2c7a ("pull: optionally rebase submodules (remote submodule
changes only)", 2017-06-23). The same limitations apply now as then, so
we are not trying to fix this at this time.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pull --rebase=false" means we merge their history into ours, but
it has been described the other way around.
Cc: Stephen Haberman <stephen@exigencecorp.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
[jc: updated the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning about pulling without specifying how to reconcile divergent
branches says that after setting pull.rebase to true, --ff-only can
still be passed on the command line to require a fast-forward. Make that
actually work.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
[en: updated tests; note 3 fixes and 1 new failure]
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were already code checking that --rebase was incompatible with
a merge of multiple heads. However, we were sometimes throwing warnings
about lack of specification of rebase vs. merge when given multiple
heads. Since rebasing is disallowed with multiple merge heads, that
seems like a poor warning to print; we should instead just assume
merging is wanted.
Add a few tests checking multiple merge head behavior.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interaction of rebase and merge flags and options was not well
tested. Add several tests to check for correct behavior from the
following rules:
* --ff-only vs. --[no-]rebase
(and the related pull.ff=only vs. pull.rebase)
* --rebase[=!false] vs. --no-ff and --ff
(and the related pull.rebase=!false overrides pull.ff=!only)
* command line flags take precedence over config, except:
* --no-rebase heeds pull.ff=!only
* pull.rebase=!false vs --no-ff and --ff
For more details behind these rules and a larger table of individual
cases, refer to https://lore.kernel.org/git/xmqqwnpqot4m.fsf@gitster.g/
and the links found therein.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that eventually became filter_bitmap_exclude_type() was
originally introduced in 4f3bd5606a (pack-bitmap: implement BLOB_NONE
filtering, 2020-02-14) to accelerate BLOB_NONE filters with bitmaps.
In 856e12c18a (pack-bitmap.c: make object filtering functions generic,
2020-05-04), it became filter_bitmap_exclude_type(). But not all of the
comments were updated to be agnostic to the provided type.
Remove the remaining comments which should have been updated in
856e12c18a to reflect the type-agnostic nature of the function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running unpack_trees() with a sparse index, we attempt to operate
on the index without expanding the sparse directory entries. Thus, we
operate by manipulating entire directories and passing them to the
unpack function. In the case of the 'git checkout' command, this is the
twoway_merge() function.
There are several cases in twoway_merge() that handle different
situations. One new one to add is the case of a directory/file conflict
where the directory is sparse. Before the sparse index, such a conflict
would appear as a list of file additions and deletions. Now,
twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
equal to the df_conflict_entry. The way to determine that we have a
directory/file conflict is to test that 'current' and 'newtree' disagree
on being sparse directory entries.
When we are in this case, we want to resolve the situation by calling
merged_entry(). This allows replacing the 'current' entry with the
'newtree' entry. This is important for cases where we want to run 'git
checkout' across the conflict and have the new HEAD represent the new
file type at that path. The first NEEDSWORK comment dropped in t1092
demonstrates this necessary behavior.
However, we still are in a confusing state when 'current' corresponds to
a staged change within a sparse directory that is not present at HEAD.
This should be atypical, because it requires adding a change outside of
the sparse-checkout cone, but it is possible. Since we are unable to
determine that this is a staged change within twoway_merge(), we cannot
add a case to reject the merge at this point. I believe this is due to
the use of df_conflict_entry in the place of 'oldtree' instead of using
the valud at HEAD, which would provide some perspective to this
decision. Any change that would allow this differentiation for staged
entries would need to involve information further up in unpack_trees().
That work should be done, sometime, because we are further confusing the
behavior of a directory/file conflict when staging a change in the
directory. The two cases 'checkout behaves oddly with df-conflict-?' in
t1092 demonstrate that even without a sparse-checkout, Git is not
consistent in its behavior. Neither of the two options seems correct,
either. This change makes the sparse-index behave differently than the
typcial sparse-checkout case, but it does match the full checkout
behavior in the df-conflict-2 case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new branches to the test repo that demonstrate directory/file
conflicts in different ways. Since the directory 'folder1/' has
adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes
searches for 'folder1/' to land in a different place in the index than a
search for 'folder1'. This causes a change in behavior when working with
the df-conflict-1 and df-conflict-2 branches, whose only difference is
that the first uses 'folder1' as the conflict and the other uses
'folder2' which does not have these adjacent files.
We can extend two tests that compare the behavior across different 'git
checkout' commands, and we see already that the behavior will be
different in some cases and not in others. The difference between the
two test loops is that one uses 'git reset --hard' between iterations.
Further, we isolate the behavior of creating a staged change within a
directory and then checking out a branch where that directory is
replaced with a file. A full checkout behaves differently across these
two cases, while a sparse-checkout cone behaves consistently. In both
cases, the behavior is wrong. In one case, the staged change is dropped
entirely. The other case the staged change is kept, replacing the file
at that location, but none of the other files in the directory are kept.
Likely, the correct behavior in this case is to reject the checkout and
report the conflict, leaving HEAD in its previous location. None of the
cases behave this way currently. Use comments to demonstrate that the
tested behavior is only a documentation of the current, incorrect
behavior to ensure we do not _accidentally_ change it. Instead, we would
prefer to change it on purpose with a future change.
At this point, the sparse-index does not handle these 'git checkout'
commands correctly. Or rather, it _does_ reject the 'git checkout' when
we have the staged change, but for the wrong reason. It also rejects the
'git checkout' commands when there is no staged change and we want to
replace a directory with a file. A fix for that unstaged case will
follow in the next change, but that will make the sparse-index agree
with the full checkout case in these documented incorrect behaviors.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The doc for 'submodule.recurse' starts with "Specifies if commands
recurse into submodles by default". This is not exactly true of all
commands that have a '--recurse-submodules' option. For example, 'git
pull --recurse-submodules' does not run 'git pull' in each submodule,
but rather runs 'git submodule update --recursive' so that the submodule
working trees after the pull matches the commits recorded in the
superproject.
Clarify that by just saying that it enables '--recurse-submodules'.
Note that the way this setting interacts with 'fetch.recurseSubmodules'
and 'push.recurseSubmodules', which can have other values than true or
false, is already documented since 4da9e99e6e (doc: be more precise on
(fetch|push).recurseSubmodules, 2020-04-06).
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the end of the FILES section, we indicate that you can override the
regular lookup rules with --global, etc. But:
- we're missing the --local option
- we point to GIT_CONFIG instead of --file, but the latter has much
better documentation
- we're vague about how the overrides work; the actual option
descriptions are much better here
So let's just mention the names and point people back to the OPTIONS
section. We could perhaps even delete this paragraph entirely, but the
presence of the names may give people reading FILES a clue about where
to look for more information.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The scope and utility of the GIT_CONFIG variable was drastically reduced
by dc87183189 (Only use GIT_CONFIG in "git config", not other programs,
2008-06-30). But the documentation in git-config(1) predates that, which
makes it rather misleading.
These days it is really just another way to say "--file". So let's say
that, and explicitly make it clear that it does not impact other Git
commands (like GIT_CONFIG_SYSTEM, etc, would).
I also bumped it to the bottom of the list of variables, and warned
people off of using it. We don't have any plans for deprecation at this
point, but there's little point in encouraging people to use it by
putting it at the top of the list.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The explanation for the --file option only refers to GIT_CONFIG. This
redirection to an environment variable is confusing, but doubly so
because the description of GIT_CONFIG is out of date.
Let's describe --file from scratch, detailing both the reading and
writing behavior as we do for other similar options like --system, etc.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge algorithm mostly consists of the following three functions:
collect_merge_info()
detect_and_process_renames()
process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames(). When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.
Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames. It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.
Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.
However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
collect_merge_info()
detect_and_process_renames()
process_entries()
into
collect_merge_info()
detect_and_process_renames()
<cache all the renames, and restart>
collect_merge_info()
detect_and_process_renames()
process_entries()
This may seem odd and look like more work. However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small. While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up). And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.
This restarting only makes sense if we can save recursing into enough
directories to make it worth our while. Introduce a simple heuristic to
guide this. Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air. It could be that
tweaking this number to make it either higher or lower improves the
optimization. (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases. I was
certain it was higher than needed. Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little. If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)
Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms
mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s
just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This combines the work of the last several patches, and implements the
conditions when we don't need to recurse into directories. It's perhaps
easiest to see the logic by separating the fact that a directory might
have both rename sources and rename destinations:
* rename sources: only files present in the merge base can serve as
rename sources, and only when one side deletes that file. When the
tree on one side matches the merge base, that means every file
within the subtree matches the merge base. This means that the
skip-irrelevant-rename-detection optimization from before kicks in
and we don't need any of these files as rename sources.
* rename destinations: the tree that does not match the merge base
might have newly added and hence unmatched destination files.
This is what usually prevents us from doing trivial directory
resolutions in the merge machinery. However, the fact that we have
deferred recursing into this directory until the end means we know
whether there are any unmatched relevant potential rename sources
elsewhere in this merge. If there are no unmatched such relevant
sources anywhere, then there is no need to look for unmatched
potential rename destinations to match them with.
This informs our algorithm:
* Search through relevant_sources; if we have entries, they better all
be reflected in cached_pairs or cached_irrelevant, otherwise they
represent an unmatched potential rename source (causing the
optimization to be disallowed).
* For any relevant_source represented in cached_pairs, we do need to
to make sure to get the destination for each source, meaning we need
to recurse into any ancestor directories of those destinations.
* Once we've recursed into all the rename destinations for any
relevant_sources in cached_pairs, we can then do the trivial
directory resolution for the remaining directories.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms
mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s
just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When one side of history matches the merge base (including when the
merge base has no entry for the given directory), have
collect_merge_info_callback() defer recursing into the directory. To
ensure those entries are eventually handled, add a call to
handled_deferred_entries() in collect_merge_info() after
traverse_trees() returns.
Note that the condition in collect_merge_info_callback() may look more
complicated than necessary at first glance;
renames->trivial_merges_okay[side] is always true until
handle_deferred_entries() is called, and possible_trivial_merges[side]
is always empty right now (and in the future won't be filled until
handle_deferred_entries() is called). However, when
handle_deferred_entries() calls traverse_trees() for the relevant
deferred directories, those traverse_trees() calls will once again end
up in collect_merge_info_callback() for all the entries under those
subdirectories. The extra conditions are there for such deferred cases
and will be used more as we do more with those variables.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to allow trivial directory resolution, we first need to be able
to gather more information to determine if the optimization is safe. To
enable that, we need a way of deferring the recursion into the directory
until a later time. Naturally, deferring the entry into a subtree means
that we need some function that will later recurse into the subdirectory
exactly the same way that collect_merge_info_callback() would have done.
Add a helper function that does this. For now this function is not used
but a subsequent commit will change that. Future commits will also make
the function sometimes resolve directories instead of traversing inside.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted a few commits ago, we can resolve individual files early if all
three sides of the merge have a file at the path and two of the three
sides match. We would really like to do the same thing with
directories, because being able to do a trivial directory resolve means
we don't have to recurse into the directory, potentially saving us a
huge amount of time in both collect_merge_info() and process_entries().
Unfortunately, resolving directories early would mean missing any
renames whose source or destination is underneath that directory.
If we somehow knew there weren't any renames under the directory in
question, then we could resolve it early. Sadly, it is impossible to
determine whether there are renames under the directory in question
without recursing into it, and this has traditionally kept us from ever
implementing such an optimization.
In commit f89b4f2bee ("merge-ort: skip rename detection entirely if
possible", 2021-03-11), we added an additional reason that rename
detection could be skipped entirely -- namely, if no *relevant* sources
were present. Without completing collect_merge_info_callback(), we do
not yet know if there are no relevant sources. However, we do know that
if the current directory on one side matches the merge base, then every
source file within that directory will not be RELEVANT_CONTENT, and a
few simple checks can often let us rule out RELEVANT_LOCATION as well.
This suggests we can just defer recursing into such directories until
the end of collect_merge_info.
Since the deferred directories are known to not add any relevant sources
due to the above properties, then if there are no relevant sources after
we've traversed all paths other than the deferred ones, then we know
there are not any relevant sources. Under those conditions, rename
detection is unnecessary, and that means we can resolve the deferred
directories without recursing into them.
Note that the logic for skipping rename detection was also modified
further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached
renames when possible", 2021-01-30); in particular rename detection can
be skipped if we already have cached renames for each relevant source.
We can take advantage of this information as well with our deferral of
recursing into directories where one side matches the merge base.
Add some data structures that we will use to do these deferrals, with
some lengthy comments explaining their purpose.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patch possibly raises some questions about whether
additional cases in collect_merge_info_callback() can be handled early.
Add some explanations in the form of comments to help explain these
better. While we're at it, add a few comments to denote what a few
boolean '0' or '1' values stand for.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there are no directories involved at a given path, and all three
sides have a file at that path, and two of the three sides of history
match, we can immediately resolve the merge of that path in
collect_merge_info() and do not need to wait until process_entries().
This is actually a very minor improvement: half the time when I run it,
I see an improvement; the other half a slowdown. It seems to be in the
range of noise. However, this idea serves as the beginning of some
bigger optimizations coming in the following patches.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases in t0000 are quite special given that they many of them run
nested testcases to verify that testing functionality itself works as
expected. These nested testcases are realized by writing a new ad-hoc
test script which again sources test-lib.sh, where the new script is
created in a nested subdirectory located beneath the current trash
directory. We then execute the new test script with the nested
subdirectory as current working directory and explicitly re-export
TEST_OUTPUT_DIRECTORY to point to that directory.
While this works as expected in the general case, it falls apart when
the developer has TEST_OUTPUT_DIRECTORY explicitly defined either via
the environment or via config.mak and runs "make test". In that case,
test-lib.sh will clobber the value that we've just carefully set up to
instead contain what the developer has defined. As a result, the
TEST_OUTPUT_DIRECTORY continues to point at the root output directory,
not at the nested one.
This issue causes breakage in the 'test_atexit is run' test case: the
nested test case writes files into "../../", which is assumed to be the
parent's trash directory. But because TEST_OUTPUT_DIRECTORY already
points to to the root output directory, we instead end up writing those
files outside of the output directory. The parent test case will then
try to check whether those files still exist in its own trash directory,
which thus must fail now.
Fix the issue by adding a new TEST_OUTPUT_DIRECTORY_OVERRIDE variable.
If set, then we'll always override the TEST_OUTPUT_DIRECTORY with its
value after sourcing GIT-BUILD-OPTIONS.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cd57bc41bb (builtin/multi-pack-index.c: display usage on
unrecognized command, 2021-03-30) we have used a "usage" label to avoid
having two separate callers of usage_with_options (one when no arguments
are given, and another for unrecognized sub-commands).
But the first caller has been broken since cd57bc41bb, since it will
happily jump to usage without arguments, and then pass argv[0] to the
"unrecognized subcommand" error.
Many compilers will save us from a segfault here, but the end result is
ugly, since it mentions an unrecognized subcommand when we didn't even
pass one, and (on GCC) includes "(null)" in its output.
Move the "usage" label down past the error about unrecognized
subcommands so that it is only triggered when it should be. While we're
at it, bulk up our test coverage in this area, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the empty prefix ("") stand out better.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t0000, we run several fake "sub-test" suites to verify the behavior
of the test suite. But because we don't clear the parent environment
completely, the sub-tests can be fooled by variables meant for the
parent. For example:
GIT_SKIP_TESTS=t1234 ./t0000-basic.sh
fails when a sub-test expects its fake t1234 to actually run. This
particular pattern is unlikely in practice; we're running a single
script, and there is no t1234 in the real test suite anyway (not yet, at
least). A more real-world example is:
GIT_SKIP_TESTS=t[^0]* make test
to run only the t0* tests.
The fix is conceptually simple: we should clear the GIT_SKIP_TESTS
variable when running the sub-tests, because its contents (if any) will
be meant for the main test suite. This is easy to do centrally in our
sub-test helper.
But there's a catch: some of our tests do set GIT_SKIP_TESTS
intentionally to test the feature. We need to allow them to continue to
set it, but clear it for all the other tests. And the sub-test helper
can't tell if the GIT_SKIP_TESTS it sees is from a test or not. We can
handle this by adding a new option to the helper to let callers specify
the skip list.
I considered adding a more general "--eval" option to let callers set up
the env for the sub-test however they like. That would cover this case
and possible future ones. But the quoting gets awkward for the callers
(since we're now 2 layers deep in evals!), so I went with the simpler
more specific solution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shell+perl "[de]packetize()" helper functions were added in
4414a15002 (t/lib-git-daemon: add network-protocol helpers,
2018-01-24), and around the same time we added the "pkt-line" helper
in 74e7002961 (test-pkt-line: introduce a packet-line test helper,
2018-03-14).
For some reason it seems we've mostly used the shell+perl version
instead of the helper since then. There were discussions around
88124ab263 (test-lib-functions: make packetize() more efficient,
2020-03-27) and cacae4329f (test-lib-functions: simplify packetize()
stdin code, 2020-03-29) to improve them and make them more efficient.
There was one good reason to do so, we needed an equivalent of
"test-tool pkt-line pack", but that command wasn't capable of handling
input with "\n" (a feature) or "\0" (just because it happens to be
printf-based under the hood).
Let's add a "pkt-line-raw" helper for that, and expose is at a
packetize_raw() to go with the existing packetize() on the shell
level, this gives us the smallest amount of change to the tests
themselves.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.
* ds/gender-neutral-doc:
*: fix typos
comments: avoid using the gender of our users
doc: avoid using the gender of other people
Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.
* jt/partial-clone-submodule-1:
promisor-remote: teach lazy-fetch in any repo
run-command: refactor subprocess env preparation
submodule: refrain from filtering GIT_CONFIG_COUNT
promisor-remote: support per-repository config
repository: move global r_f_p_c to repo struct
Code cleanup around struct_type_init() functions.
* ab/struct-init:
string-list.h users: change to use *_{nodup,dup}()
string-list.[ch]: add a string_list_init_{nodup,dup}()
dir.[ch]: replace dir_init() with DIR_INIT
*.c *_init(): define in terms of corresponding *_INIT macro
*.h: move some *_INIT to designated initializers
Test clean-up.
* hn/refs-test-cleanup:
t7509: avoid direct file access for writing CHERRY_PICK_HEAD
t1415: avoid direct filesystem access for writing refs
Code clean-up and leak plugging in "git bundle".
* ab/bundle-updates:
bundle: remove "ref_list" in favor of string-list.c API
bundle.c: use a temporary variable for OIDs and names
bundle cmd: stop leaking memory from parse_options_cmd_bundle()
Fill test gaps.
* ab/mktag-tests:
mktag tests: test fast-export
mktag tests: test for-each-ref
mktag tests: test update-ref and reachable fsck
mktag tests: test hash-object --literally and unreachable fsck
mktag tests: invert --no-strict test
mktag tests: parse out options in helper
Fill test gaps.
* ab/show-branch-tests:
show-branch tests: add missing tests
show-branch: don't <COLOR></RESET> for space characters
show-branch tests: modernize test code
show-branch tests: rename the one "show-branch" test file
Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.
* ab/fetch-negotiate-segv-fix:
fetch: fix segfault in --negotiate-only without --negotiation-tip=*
fetch: document the --negotiate-only option
send-pack.c: move "no refs in common" abort earlier
Update the location of system-side configuration file on Windows.
* js/gfw-system-config-loc-fix:
config: normalize the path of the system gitconfig
cmake(windows): set correct path to the system Git config
mingw: move Git for Windows' system config where users expect it
When rebuilding the multi-pack index file reusing an existing one,
we used to blindly trust the existing file and ended up carrying
corrupted data into the updated file, which has been corrected.
* tb/midx-use-checksum:
midx: report checksum mismatches during 'verify'
midx: don't reuse corrupt MIDXs when writing
commit-graph: rewrite to use checksum_valid()
csum-file: introduce checksum_valid()
The merge code had funny interactions between content based rename
detection and directory rename detection.
* en/merge-dir-rename-corner-case-fix:
merge-recursive: handle rename-to-self case
merge-ort: ensure we consult df_conflict and path_conflicts
t6423: test directory renames causing rename-to-self
Performance tweaks of "git merge -sort" around lazy fetching of objects.
* en/ort-perf-batch-13:
merge-ort: add prefetching for content merges
diffcore-rename: use a different prefetch for basename comparisons
diffcore-rename: allow different missing_object_cb functions
t6421: add tests checking for excessive object downloads during merge
promisor-remote: output trace2 statistics for number of objects fetched
More fix-ups and optimization to "merge -sort".
* en/ort-perf-batch-12:
merge-ort: miscellaneous touch-ups
Fix various issues found in comments
diffcore-rename: avoid unnecessary strdup'ing in break_idx
merge-ort: replace string_list_df_name_compare with faster alternative
Technical writing seeks to convey information with minimal
friction. One way that a reader can experience friction is if they
encounter a description of "a user" that is later simplified using a
gendered pronoun. If the reader does not consider that pronoun to
apply to them, then they can experience cognitive dissonance that
removes focus from the information.
Give some basic tips to guide us avoid unnecessary uses of gendered
description.
Using a gendered pronoun is appropriate when referring to a specific
person.
There are acceptable existing uses of gendered pronouns within the
Git codebase, such as:
* References to real people (e.g. Linus Torvalds, "the Git maintainer").
Do not misgender real people. If there is any doubt to the gender of a
person, then avoid using pronouns.
* References to fictional people with clear genders (e.g. Alice and
Bob).
* Sample text used in test cases (e.g t3702, t6432).
* The official text of the GPL license contains uses of "he or she",
but using singular "they" (or modifying the text in some other
way) is not within the scope of the Git project.
* Literal email messages in Documentation/howto/ should not be edited
for grammatical concerns such as this, unless we update the entire
document to fit the standard documentation format. If such an effort is
taken on, then the authorship would change and no longer refer to the
exact mail message.
* External projects consumed in contrib/ should not deviate solely for
style reasons. Recommended edits should be contributed to those
projects directly.
Other cases within the Git project were cleaned up by the previous
changes.
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'OPT_ALIAS' was created in 5c387428f1 (parse-options: don't emit
"ambiguous option" for aliases, 2019-04-29), 'git clone
--git-completion-helper', which is used by the Bash completion script to
list options accepted by clone (via '__gitcomp_builtin'), lists both
'--recurse-submodules' and its alias '--recursive', which was not the
case before since '--recursive' had the PARSE_OPT_HIDDEN flag set, and
options with this flag are skipped by 'parse-options.c::show_gitcomp',
which implements 'git <cmd> --git-completion-helper'.
This means that typing 'git clone --recurs<TAB>' will yield both
'--recurse-submodules' and '--recursive', which is not ideal since both
do the same thing, and so the completion should directly complete the
canonical option.
At the point where 'show_gitcomp' is called in 'parse_options_step',
'preprocess_options' was already called in 'parse_options', so any
aliases are now copies of the original options with a modified help text
indicating they are aliases.
Helpfully, since 64cc539fd2 (parse-options: don't leak alias help
messages, 2021-03-21) these copies have the PARSE_OPT_FROM_ALIAS flag
set, so check that flag early in 'show_gitcomp' and do not print them,
unless the user explicitely requested that *all* completion be shown (by
setting 'GIT_COMPLETION_SHOW_ALL'). After all, if we want to encourage
the use of '--recurse-submodules' over '--recursive', we'd better just
suggest the former.
The only other options alias is 'log' and friends' '--mailmap', which is
an alias for '--use-mailmap', but the Bash completion helpers for these
commands do not use '__gitcomp_builtin', and thus are unnaffected by
this change.
Test the new behaviour in t9902-completion.sh. As a side effect, this
also tests the correct behaviour of GIT_COMPLETION_SHOW_ALL, which was
not tested before. Note that since '__gitcomp_builtin' caches the
options it shows, we need to re-source the completion script to clear
that cache for the second test.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).
Since that time:
* Linus has continued recommending kernel folks to set
diff.renameLimit=0 (maps to 32767, currently)
* Folks with repositories with lots of renames were happy to set
merge.renameLimit above 32767, once the code supported that, to
get correct cherry-picks
* Processors have gotten faster
* It has been discovered that the timing methodology used last time
probably used too large example files.
The last point is probably worth explaining a bit more:
* The "average" file size used appears to have been average blob size
in the linux kernel history at the time (probably v2.6.25 or
something close to it).
* Since bigger files are modified more frequently, such a computation
weights towards larger files.
* Larger files may be more likely to be modified over time, but are
not more likely to be renamed -- the mean and median blob size
within a tree are a bit higher than the mean and median of blob
sizes in the history leading up to that version for the linux
kernel.
* The mean blob size in v2.6.25 was half the average blob size in
history leading to that point
* The median blob size in v2.6.25 was about 40% of the mean blob size
in v2.6.25.
* Since the mean blob size is more than double the median blob size,
any file as big as the mean will not be compared to any files of
median size or less (because they'd be more than 50% dissimilar).
* Since it is the number of files compared that provides the O(n^2)
behavior, median-sized files should matter more than mean-sized
ones.
The combined effect of the above is that the file size used in past
calculations was likely about 5x too large. Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.
Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly. In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.) For now, let's just bump the approximate
time limit from 10s to 1m.
(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)
Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):
N Timing
1300 1.995s
7100 59.973s
So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.
Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):
#!/bin/bash
n=$1; shift
rm -rf repo
mkdir repo && cd repo
git init -q -b main
mkdata() {
mkdir $1
for i in `seq 1 $2`; do
(sed "s/^/$i /" <../sample
echo tag: $1
) >$1/$i
done
}
mkdata initial $n
git add .
git commit -q -m initial
mkdata new $n
git add .
cd new
for i in *; do git mv $i $i.renamed; done
cd ..
git rm -q -rf initial
git commit -q -m new
time git diff-tree -M -l0 --summary HEAD^ HEAD
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 89973554b5 (diffcore-rename: make diff-tree -l0 mean
-l<large>, 2017-11-29), -l0 was given a special magical "large" value,
but one which was not large enough for some uses (as can be seen from
commit 9f7e4bfa3b (diff: remove silent clamp of renameLimit,
2017-11-13). Make 0 (or a negative value) be treated as unlimited
instead and update the documentation to mention this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection. The following two commits each
introduced an exception to this:
9027f53cb5 (Do linear-time/space rename logic for exact renames,
2007-10-25)
bd24aa2f97 (diffcore-rename: guide inexact rename detection based
on basenames, 2021-02-14)
(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection. So, for copy detection, the documentation was
closer to correct.)
Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm. While at it, also note the
default values for all these settings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning when quadratic rename detection was skipped referred to
"inexact rename detection". For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection. However, that changed with commit bd24aa2f97
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14). Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default reason stored in the lock file, "added with --lock",
is unlikely to be what the user would have given in a separate
`git worktree lock` command. Allowing `--reason` to be specified
along with `--lock` when adding a working tree gives the user control
over the reason for locking without needing a second command.
Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a full hexadecimal hash is given as a --negotiation-tip to "git
fetch", and that hash does not correspond to an object, "git fetch" will
segfault if --negotiate-only is given and will silently ignore that hash
otherwise. Make these cases fatal errors, just like the case when an
invalid ref name or abbreviated hash is given.
While at it, mark the error messages as translatable.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 477673d6f3 ("send-pack: support push negotiation", 2021-05-05)
did not test the case in which a remote advertises at least one ref. In
such a case, "remote_refs" in get_commons_through_negotiation() in
send-pack.c would also contain those refs with a zero ref->new_oid (in
addition to the refs being pushed with a nonzero ref->new_oid). Passing
them as negotiation tips to "git fetch" causes an error, so filter them
out.
(The exact error that would happen in "git fetch" in this case is a
segmentation fault, which is unwanted. This will be fixed in the
subsequent commit.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 477673d6f3 ("send-pack: support push negotiation", 2021-05-05)
introduced the push.negotiate config variable and included a test. The
test only covered pushing without a remote helper, so the fact that
pushing with a remote helper doesn't work went unnoticed.
This is ultimately caused by the "url" field not being set in the args
struct. This field being unset probably went unnoticed because besides
push negotiation, this field is only used to generate a "pushee" line in
a push cert (and if not given, no such line is generated). Therefore,
set this field.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During a run of the `check-whitespace` we want to verify that the
commits introduced in the Pull Request have no whitespace issues. We
only want to look at those commits, not the upstream commits (because
the contributor cannot do anything about the latter).
However, by using the `-<count>` form in `git log --check`, we run the
risk of looking at the wrong commits. The reason is that the
`actions/checkout` step does _not_ check out the tip commit of the Pull
Request's branch: Instead, it checks out a merge commit that merges that
branch into the target branch. For that reason, we already adjust the
commit count by incrementing it, but that is not enough: if the upstream
branch has newer commits, they are traversed _first_. And obviously we
will then miss some of the commits that we _actually_ wanted to look at.
Therefore, let's be careful to stop assuming a linear, up to date commit
topology in the contributed commits, and instead specify the correct
commit range.
Unfortunately, this means that we no longer can rely on a shallow clone:
There is no way of knowing just how many commits the upstream branch
advanced after the commit from which the PR branch branched off. So
let's just go with a full clone instead, and be safe rather than sorry
(if we have "too shallow" a situation, a commit range `@{u}..` may very
well include a shallow commit itself, and the output of `git show
--check <shallow>` is _not_ pretty).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of some recent security tightening, GitHub introduced the
ability to configure GitHub workflows to be run with a read-only token.
This is much more secure, in particular when working in a public
repository: While the regular read/write token might be restricted to
writing to the current branch, it is not necessarily restricted to
access only the current Pull Request.
However, the `check-whitespace` workflow threw a wrench into this plan:
it _requires_ write access (because it wants to add a PR comment in case
of a whitespace issue).
Let's just skip that PR comment. The user can always click through to
the actual error, even if it is slightly less convenient.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous changes did the necessary improvements to unpack-trees.c and
diff-lib.c in order to modify a sparse index based on its comparision
with a tree. The only remaining work is to remove some
ensure_full_index() calls and add tests that verify that the index is
not expanded in our interesting cases. Include 'switch' and 'restore' in
these tests, as they share a base implementation with 'checkout'.
Here are the relevant performance results from
p2000-sparse-operations.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------
2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1%
2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7%
2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7%
2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3%
It is important to compare the full index case to the sparse index case,
as the previous results for the sparse index were inflated by the index
expansion. For index v4, this is an 88% improvement.
On an internal repository with over two million paths at HEAD and a
sparse-checkout definition containing ~60,000 of those paths, 'git
checkout' went from 3.5s to 297ms with this change. The theoretical
optimum where only those ~60,000 paths exist was 275ms, so the extra
sparse directory entries contribute a 22ms overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When some commands run with command_requires_full_index=1, then the
index can get in a state where the in-memory cache tree is actually
equal to the sparse index's cache tree instead of the full one.
This results in incorrect entry_count values. By clearing the cache
tree before converting to sparse, we avoid this issue.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update 'git commit' to allow using the sparse-index in memory without
expanding to a full one. The only place that had an ensure_full_index()
call was in cache_tree_update(). The recursive algorithm for
update_one() was already updated in 2de37c536 (cache-tree: integrate
with sparse directory entries, 2021-03-03) to handle sparse directory
entries in the index.
Most of this change involves testing different command-line options that
allow specifying which on-disk changes should be included in the commit.
This includes no options (only take currently-staged changes), -a (take
all tracked changes), and --include (take a list of specific changes).
To simplify testing that these options do not expand the index, update
the test that previously verified that 'git status' does not expand the
index with a helper method, ensure_not_expanded().
This allows 'git commit' to operate much faster when the sparse-checkout
cone is much smaller than the full list of files at HEAD.
Here are the relevant lines from p2000-sparse-operations.sh:
Test HEAD~1 HEAD
----------------------------------------------------------------------------------
2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9%
2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3%
2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7%
2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8%
It is important to compare the full-index case to the sparse-index case,
so the improvement for index version v4 is actually an 88% improvement in
this synthetic example.
In a real repository with over two million files at HEAD and 60,000
files in the sparse-checkout definition, the time for 'git commit -a'
went from 2.61 seconds to 134ms. I compared this to the result if the
index only contained the paths in the sparse-checkout definition and
found the theoretical optimum to be 120ms, so the out-of-cone paths only
add a 12% overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By using shorter names for the test repos, we will get a slightly more
compressed performance summary without comprimising clarity.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we increase our list of commands to test in
p2000-sparse-operations.sh, we will want to have a slightly smaller test
repository. Reduce the size by a factor of four by reducing the depth of
the step that creates a big index around a moderately-sized repository.
Also add a step to run 'git checkout -' on repeat. This requires having
a previous location in the reflog, so add that to the initialization
steps.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.
Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.
Helped-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.
While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.
These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.
This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:
* The sparse-checkout repo (with a full index) would output each path
name that is intended to be added.
* The sparse-index repo would only output that "folder1/" is staged for
addition.
The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.
Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().
In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.
This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.
The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:
Test HEAD~1 HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9%
Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.
Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.
Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.
This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.
Use diff_tree_oid() appropriately to compute the diff.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have sparse directory entries in the index, we want to compare
that directory against sparse-checkout patterns. Those pattern matching
algorithms are built expecting a file path, not a directory path. This
is especially important in the "cone mode" patterns which will match
files that exist within the "parent directories" as well as the
recursive directory matches.
If path_matches_pattern_list() is given a directory, we can add a fake
filename ("-") to the directory and get the same results as before,
assuming we are in cone mode. Since sparse index requires cone mode
patterns, this is an acceptable assumption.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.
In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_single_entry() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().
There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this. We also need to ignore conflict
markers in the case that the entries correspond to directories and we
already have a sparse directory entry.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next change, we will use this method to unpack a sparse directory
entry, so change the name to unpack_single_entry() so these entries
apply. The new name reflects that we will not recurse into trees in
order to resolve the conflicts.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.
Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.
Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry. The final check is a
length comparison between the strings.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.
The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.
Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.
Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.
Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:
* Add sparse-directory entries on both sides of directories within the
sparse-checkout definition.
* Add directories outside the sparse-checkout definition who have only
one entry and are the first entry of a directory with multiple
entries.
* Add filenames adjacent to a sparse directory entry that sort before
and after the trailing slash.
Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.
However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.
Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.
It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Occasionally we receive reviews after patches were integrated, where
`sparse` (https://sparse.docs.kernel.org/en/latest/ has more information
on that project) identified problems such as file-local variables or
functions being declared as global.
By running `sparse` as part of our Continuous Integration, we can catch
such things much earlier. Even better: developers who activated GitHub
Actions on their forks can catch such issues before even sending their
patches to the Git mailing list.
This addresses https://github.com/gitgitgadget/git/issues/345
Note: Not even Ubuntu 20.04 ships with a new enough version of `sparse`
to accommodate Git's needs. The symptom looks like this:
add-interactive.c:537:51: error: Using plain integer as NULL pointer
To counter that, we download and install the custom-built `sparse`
package from the Azure Pipeline that we specifically created to address
this issue.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
Rather than calling `parse_object()`, we go for `oid_object_info()` and
then `lookup_object_by_type()` using the type just discovered. As
detailed in the commit message, this provides a significant time saving.
Unfortunately, it also changes the behavior: We lose all annotated tags
from the decoration.
The reason this happens is in the loop where we try to peel the tags, we
won't necessarily have parsed that first object. If we haven't, its
`tagged` field will be NULL, so we won't actually add a decoration for
the pointed-to object.
Make sure to parse the tag object at the top of the peeling loop. This
effectively restores the pre-88473c8bae parsing -- but only of tags,
allowing us to keep most of the possible speedup from 88473c8bae.
On my big ~220k ref test case (where it's mostly non-tags), the
timings [using "git log -1 --decorate"] are:
- before either patch: 2.945s
- with my broken patch: 0.707s
- with [this patch]: 0.788s
The simplest way to do this is to just conditionally parse before the
loop:
if (obj->type == OBJ_TAG)
parse_object(&obj->oid);
But we can observe that our tag-peeling loop needs to peel already, to
examine recursive tags-of-tags. So instead of introducing a new call to
parse_object(), we can simply move the parsing higher in the loop:
instead of parsing the new object before we loop, parse each tag object
before we look at its "tagged" field.
This has another beneficial side effect: if a tag points at a commit (or
other non-tag type), we do not bother to parse the commit at all now.
And we know it is a commit without calling oid_object_info(), because
parsing the surrounding tag object will have created the correct in-core
object based on the "type" field of the tag.
Our test coverage for --decorate was obviously not good, since we missed
this quite-basic regression. The new tests covers an annotated tag
(showing the fix), but also that we correctly show annotations for
lightweight tags and double-annotated tag-of-tags.
Reported-by: Martin Ågren <martin.agren@gmail.com>
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- default lock string, "added with --lock"
- temporary lock string, "initializing"
Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- remove unneeded `git rev-parse` which must have come from a copy-paste
of another test
- unlock the worktree with test_when_finished
Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep --and -e foo" ought to have been diagnosed as an error
but instead segfaulted, which has been corrected.
* rs/grep-parser-fix:
grep: report missing left operand of --and
The "union" conflict resolution variant misbehaved when used with
binary merge driver.
* jk/union-merge-binary:
ll_union_merge(): rename path_unused parameter
ll_union_merge(): pass name labels to ll_xdl_merge()
ll_binary_merge(): handle XDL_MERGE_FAVOR_UNION
Various updates to tests around "git describe"
* ab/describe-tests-fix:
describe tests: support -C in "check_describe"
describe tests: fix nested "test_expect_success" call
describe tests: don't rely on err.actual from "check_describe"
describe tests: refactor away from glob matching
describe tests: improve test for --work-tree & --dirty
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.
* ab/pickaxe-pcre2: (22 commits)
xdiff-interface: replace discard_hunk_line() with a flag
xdiff users: use designated initializers for out_line
pickaxe -G: don't special-case create/delete
pickaxe -G: terminate early on matching lines
xdiff-interface: allow early return from xdiff_emit_line_fn
xdiff-interface: prepare for allowing early return
pickaxe -S: slightly optimize contains()
pickaxe: rename variables in has_changes() for brevity
pickaxe -S: support content with NULs under --pickaxe-regex
pickaxe: assert that we must have a needle under -G or -S
pickaxe: refactor function selection in diffcore-pickaxe()
perf: add performance test for pickaxe
pickaxe/style: consolidate declarations and assignments
diff.h: move pickaxe fields together again
pickaxe: die when --find-object and --pickaxe-all are combined
pickaxe: die when -G and --pickaxe-regex are combined
pickaxe tests: add missing test for --no-pickaxe-regex being an error
pickaxe tests: test for -G, -S and --find-object incompatibility
pickaxe tests: add test for "log -S" not being a regex
pickaxe tests: add test for diffgrep_consume() internals
...
Preliminary clean-up of tests before the main reftable changes
hits the codebase.
* hn/prep-tests-for-reftable: (22 commits)
t1415: set REFFILES for test specific to storage format
t4202: mark bogus head hash test with REFFILES
t7003: check reflog existence only for REFFILES
t7900: stop checking for loose refs
t1404: mark tests that muck with .git directly as REFFILES.
t2017: mark --orphan/logAllRefUpdates=false test as REFFILES
t1414: mark corruption test with REFFILES
t1407: require REFFILES for for_each_reflog test
test-lib: provide test prereq REFFILES
t5304: use "reflog expire --all" to clear the reflog
t5304: restyle: trim empty lines, drop ':' before >
t7003: use rev-parse rather than FS inspection
t5000: inspect HEAD using git-rev-parse
t5000: reformat indentation to the latest fashion
t1301: fix typo in error message
t1413: use tar to save and restore entire .git directory
t1401-symbolic-ref: avoid direct filesystem access
t1401: use tar to snapshot and restore repo state
t5601: read HEAD using rev-parse
t9300: check ref existence using test-helper rather than a file system check
...
Some more code and doc clarification around "git push".
* fc/push-simple-updates-cleanup:
push: don't get a full remote object
push: only check same_remote when needed
push: remove trivial function
push: remove redundant check
push: factor out the typical case
push: get rid of all the setup_push_* functions
push: trivial simplifications
push: make setup_push_* return the dst
push: only get the branch when needed
push: factor out null branch check
push: split switch cases
push: return immediately in trivial switch case
push: create new get_upstream_ref() helper
"git cat-file --batch-all-objects"" misbehaved when "--batch" is in
use and did not ask for certain object traits.
* zh/cat-file-batch-fix:
cat-file: merge two block into one
cat-file: handle trivial --batch format with --batch-all-objects
Add the missing __attribute__((format)) checking to
advise_if_enabled(). This revealed a trivial issue introduced in
b3b18d1621 (advice: revamp advise API, 2020-03-02). We treated the
argv[1] as a format string, but did not intend to do so. Let's use
"%s" and pass argv[1] as an argument instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing format attributes to API functions that take printf
arguments.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing __attribute__((format)) function attributes to various
"static" functions that take printf arguments.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git diff --merge-base` was introduced at around Git 2.30, the
documentation included a few errors.
In the example given for `git diff --cached --merge-base`, the
`--cached` flag was omitted for the `--merge-base` example. Add the
missing flag.
In the `git diff <commit>` case, we failed to mention that
`--merge-base` is an available option. Give the usage of `--merge-base`
as an option there.
Finally, there are two errors in the usage of `git diff`. Firstly, we do
not mention `--merge-base` in the `git diff --cached` case. Mention it
so that it's consistent with the documentation. Secondly, we put the
`[--merge-base]` in between `<commit>` and `[<commit>...]`. Move the
`[--merge-base]` so that it's beside `[<options>]` which is a more
logical grouping.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the reflog_message() function added in
96e832a5fd (sequencer (rebase -i): refactor setting the reflog
message, 2017-01-02), it gained another user in
9055e401dd (sequencer: introduce new commands to reset the revision,
2018-04-25). Let's move it around and remove the forward declaration
added in the latter commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9cf6d3357a (Add git-index-pack utility, 2005-10-12) and
466dbc42f5 (receive-pack: Send internal errors over side-band #2,
2010-02-10) we added these static functions and forward-declared their
__attribute__((printf)).
I think this may have been to work around some compiler limitation at
the time, but in any case we have a lot of code that uses the briefer
way of declaring these that I'm using here, so if we had any such
issues with compilers we'd have seen them already.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's add a new "add-clone" subcommand to `git submodule--helper` with
the goal of converting part of the shell code in git-submodule.sh
related to `git submodule add` into C code. This new subcommand clones
the repository that is to be added, and checks out to the appropriate
branch.
This is meant to be a faithful conversion that leaves the behaviour of
'cmd_add()' script unchanged.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Prathamesh Chavan <pc44800@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate out the core logic of module_clone() from the flag
parsing---this way we can call the equivalent of the `submodule--helper
clone` subcommand directly within C, without needing to push arguments
in a strvec.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The standard `die()` function that is used in C code prefixes all the
messages passed to it with 'fatal: '. This does not happen with the
`die` used in 'git-submodule.sh'.
Let's prefix each of the shell die messages with 'fatal: ' so that when
they are converted to C code, the error messages stay the same as before
the conversion.
Note that the shell version of `die` exits with error code 1, while the
C version exits with error code 128. In practice, this does not change
any behaviour, as no functionality in 'submodule add' and 'submodule
update' relies on the value of the exit code.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In general, we encourage users to use plumbing commands, like git
rev-list, over porcelain commands, like git log, when scripting.
However, git rev-list has one glaring problem that prevents it from
being used in certain cases: when --pretty is used with a custom format,
it always prints out a line containing "commit" and the object ID. This
makes it unsuitable for many scripting needs, and forces users to use
git log instead.
While we can't change this behavior for backwards compatibility, we can
add an option to suppress this behavior, so let's do so, and call it
"--no-commit-header". Additionally, add the corresponding positive
option to switch it back on.
Note that this option doesn't affect the built-in formats, only custom
formats. This is exactly the same behavior as users already have from
git log and is what most users will be used to.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even when the `--allow-empty-message` option is given, "git commit"
offers an interactive editor session with prefilled message that says
the commit will be aborted if the buffer is emptied, which is wrong.
Remove the "an empty message aborts" part from the message when the
option is given to fix it.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Hu Jialun <hujialun@comp.nus.edu.sg>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Strings of hint messages inserted into editor on interactive commit was
scattered in-line, rendering the code harder to understand at first
glance.
Extract those messages out into separate variables to make the code
outline easier to follow.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Hu Jialun <hujialun@comp.nus.edu.sg>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a segfault in the --stdin-packs option added in
339bce27f4 (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22).
The read_packs_list_from_stdin() function didn't check that the lines
it was reading were valid packs, and thus when doing the QSORT() with
pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is
used to associate the names of included/excluded packs with the
packed_git structs they correspond to.
The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.
As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine. There was further discussion of alternate
approaches in [1].
Even though we're being lazy let's assert the line we do expect to get
in the test, since whoever changes this code in the future might miss
this case, and would want to update the test and comments.
1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0181b600a6 (pkt-line: define PACKET_READ_RESPONSE_END, 2020-05-19),
the Response End packet was defined for Git's network protocol. When the
patch was sent, it included an oversight where the error messages
referenced "stateless separator", the work-in-progress name, over
"response end", the final name chosen.
Correct these error messages by having them correctly reference
a "response end" packet.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei pointed out a typo in the Swedish translation, where a config
variable name had been copied incorrectly.
By introducing typo detection function in "git-po-helper", more typos
were found. All easy-to-fix typos were fixed in this commit.
Reported-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
When we cannot figure out how wide the terminal is, we use a
fallback value of 80 ourselves (which cannot be avoided), but when
we run the pager, we export it in COLUMNS, which forces the pager
to use the hardcoded value, even when the pager is perfectly
capable to figure it out itself. Stop exporting COLUMNS when we
fall back on the hardcoded default value for our own use.
* js/stop-exporting-bogus-columns:
pager: avoid setting COLUMNS when we're guessing its value
On Windows, mergetool has been taught to find kdiff3.exe just like
it finds winmerge.exe.
* ms/mergetools-kdiff3-on-windows:
mergetools/kdiff3: make kdiff3 work on Windows too
Output from some of our tests were affected by the width of the
terminal that they were run in, which has been corrected by
exporting a fixed value in the COLUMNS environment.
* ab/fix-columns-to-80-during-tests:
test-lib.sh: set COLUMNS=80 for --verbose repeatability
Recent update to completion script (in contrib/) broke those who
use the __git_complete helper to define completion to their custom
command.
* fw/complete-cmd-idx-fix:
completion: bash: fix late declaration of __git_cmd_idx
Some test scripts assumed that readlink(1) was universally
installed and available, which is not the case.
* jk/test-without-readlink-1:
t: use portable wrapper for readlink(1)
The side-band demultiplexer that is used to display progress output
from the remote end did not clear the line properly when the end of
line hits at a packet boundary, which has been corrected. Also
comes with test clean-ups.
* jx/sideband-cleanup:
test: refactor to use "get_abbrev_oid" to get abbrev oid
test: refactor to use "test_commit" to create commits
test: compare raw output, not mangle tabs and spaces
sideband: don't lose clear-to-eol at packet boundary
We broke "GIT_SKIP_TESTS=t?000" to skip certain tests in recent
update, which got fixed.
* jk/test-avoid-globmatch-with-skip-patterns:
test-lib: avoid accidental globbing in match_pattern_list()
Work around inefficient glob substitution in older versions of bash
by rewriting parts of a test.
* jx/t6020-with-older-bash:
t6020: fix incompatible parameter expansion
Make the codebase MSAN clean.
* ah/uninitialized-reads-fix:
builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints
split-index: use oideq instead of memcmp to compare object_id's
bulk-checkin: make buffer reuse more obvious and safer
Update "git subtree" to work better on Windows.
* js/subtree-on-windows-fix:
subtree: fix assumption about the directory separator
subtree: fix the GIT_EXEC_PATH sanity check to work on Windows
"git-svn" tests assumed that "locale -a", which is used to pick an
available UTF-8 locale, is available everywhere. A knob has been
introduced to allow testers to specify a suitable locale to use.
* dd/svn-test-wo-locale-a:
t: use user-specified utf-8 locale for testing svn
The command line completion (in contrib/) learned that "git diff"
takes the "--anchored" option.
* tb/complete-diff-anchored:
completion: add --anchored to diff's options
The recent --negotiate-only option would segfault in the call to
oid_array_for_each() in negotiate_using_fetch() unless one or more
--negotiation-tip=* options were provided.
All of the other tests for the feature combine both, but nothing was
checking this assumption, let's do that and add a test for it. Fixes a
bug in 9c1e657a8f (fetch: teach independent negotiation (no
packfile), 2021-05-04).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This saves 8K per `struct object_directory', meaning it saves
around 800MB in my case involving 100K alternates (half or more
of those alternates are unlikely to hold loose objects).
This is implemented in two parts: a generic, allocation-free
`cbtree' and the `oidtree' wrapper on top of it. The latter
provides allocation using alloc_state as a memory pool to
improve locality and reduce free(3) overhead.
Unlike oid-array, the crit-bit tree does not require sorting.
Performance is bound by the key length, for oidtree that is
fixed at sizeof(struct object_id). There's no need to have
256 oidtrees to mitigate the O(n log n) overhead like we did
with oid-array.
Being a prefix trie, it is natively suited for expanding short
object IDs via prefix-limited iteration in
`find_short_object_filename'.
On my busy workstation, p4205 performance seems to be roughly
unchanged (+/-8%). Startup with 100K total alternates with no
loose objects seems around 10-20% faster on a hot cache.
(800MB in memory savings means more memory for the kernel FS
cache).
The generic cbtree implementation does impose some extra
overhead for oidtree in that it uses memcmp(3) on
"struct object_id" so it wastes cycles comparing 12 extra bytes
on SHA-1 repositories. I've not yet explored reducing this
overhead, but I expect there are many places in our code base
where we'd want to investigate this.
More information on crit-bit trees: https://cr.yp.to/critbit.html
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with `oidcpy', the source struct will not be modified and
this will allow an upcoming const-correct caller to use it.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no point in using 8 bits per-directory when 1 bit
will do. This saves us 224 bytes per object directory, which
ends up being 22MB when dealing with 100K alternates.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can save a few milliseconds (across 100K odbs) by using
strbuf_addbuf() instead of strbuf_addstr() by passing `entry' as
a strbuf pointer rather than a "const char *".
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With many alternates, the duplicate check in alt_odb_usable()
wastes many cycles doing repeated fspathcmp() on every existing
alternate. Use a khash to speed up lookups by odb->path.
Since the kh_put_* API uses the supplied key without
duplicating it, we also take advantage of it to replace both
xstrdup() and strbuf_release() in link_alt_odb_entry() with
strbuf_detach() to avoid the allocation and copy.
In a test repository with 50K alternates and each of those 50K
alternates having one alternate each (for a total of 100K total
alternates); this speeds up lookup of a non-existent blob from
over 16 minutes to roughly 2.7 seconds on my busy workstation.
Note: all underlying git object directories were small and
unpacked with only loose objects and no packs. Having to load
packs increases times significantly.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When looking for things that hardcoded a non-zero "hint" parameter to
strbuf_fread() I discovered that since f2561fda36 (Add git-imap-send,
derived from isync 1.0.1., 2006-03-10) we've been passing a hardcoded
4096 in imap-send.c to read stdin.
Since we're not doing anything unusual here let's use a less verbose
pattern used in a lot of other places (the hint of "0" will default to
8192). We don't need to take a FILE * here either, so we can use "0"
instead of "stdin". While we're at it improve the error message if we
can't read the input to use error_errno().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test to ensure failure on adding a submodule to a directory with
tracked contents in the index.
As we are going to refactor and port to C some parts of `git submodule
add`, let's add a test to help ensure no regression is introduced.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation reads as if .gitignore files will be parsed in
every parent directory, and not until they reach a repository boundary.
This clarifies the current behaviour.
As well, this corrects 'toplevel' to 'top-level', matching usage for
'top-level domain'.
Signed-off-by: Andrew Berry <andrew@furrypaws.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Depending on the chosen format of help pages, git-help uses function
show_man_page, show_info_page, or show_html_page. The first thing all
three functions do is to convert given `git_cmd` to a `page` using
function cmd_to_page.
Move the common part of these three functions to function cmd_help to
avoid code duplication.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Felipe Contreras <felipe.contreras@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h. They terminate
the program on error instead, so code that's using them doesn't have to
handle allocation failures. Make this behavior explicit by turning
kh_resize_ into a void function and removing the related unreachable
error handling code.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t6402, we're checking number of files in the index and the working
tree by piping the output of Git's command to "wc -l", thus losing the
exit status code of git.
Let's use the new helper test_stdout_line_count in order to preserve
Git's exit status code.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t6400, we're checking number of files in the index and the working
tree by piping the output of "git ls-files" to "wc -l", thus losing the
exit status code of git.
Let's use the newly introduced test_stdout_line_count in order to check
the exit status code of Git's command.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some tests, we're checking the number of lines in output of some
commands, including but not limited to Git's command.
We're doing the check by running those commands in the left side of
a pipe, thus losing the exit status code of those commands. Meanwhile,
we really want to check the exit status code of Git's command.
Let's write the output of those commands to a temporary file, and use
test_line_count separately in order to check exit status code of
those commands properly.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By upgrading from v1 to v2 of `actions/checkout`, we avoid fetching all
the tags and the complete history: v2 only fetches one revision by
default. This should make things a lot faster.
Note that `actions/checkout@v2` seems to be incompatible with running in
containers: https://github.com/actions/checkout/issues/151. Therefore,
we stick with v1 there.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already build Git for Windows with `NO_GETTEXT` when compiling with
GCC. Let's do the same with Visual C, too.
Note that we do not technically _need_ to pass `NO_GETTEXT` explicitly
in that `make artifacts-tar` invocation because we do this while `MSVC`
is set (which will set `uname_S := Windows`, which in turn will set
`NO_GETTEXT = YesPlease`). But it is definitely nicer to be explicit
here.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Helped-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We obviously do not want to bundle `.mo` files during `make
artifacts-tar NO_GETTEXT=Yep`, but that was the case.
To fix that, go a step beyond just fixing the symptom, and simply
define the lists of `.po` and `.mo` files as empty if `NO_GETTEXT` is
set.
Helped-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's test suite is excruciatingly slow on Windows, mainly due to the
fact that it executes a lot of shell script code, and that's simply not
native to Windows.
To help with that, we established the pattern where the artifacts are
first built in one job, and then multiple test jobs run in parallel
using the artifacts built in the first job.
We take pains in transferring only the build outputs, and letting
`actions/checkout` fill in the rest of the files.
One major downside of that strategy is that the test jobs might fail to
check out the intended revision (e.g. because the branch has been
updated while the build was running, as is frequently the case with the
`seen` branch).
Let's transfer also the files tracked by Git, and skip the checkout step
in the test jobs.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move away from the "struct ref_list" in bundle.c in favor of the
almost identical string-list.c API.
That API fits this use-case perfectly, but did not exist in its
current form when this code was added in 2e0afafebd (Add git-bundle:
move objects and references by archive, 2007-02-22), with hindsight we
could have used the path-list API, which later got renamed to
string-list. See 8fd2cb4069 (Extract helper bits from
c-merge-recursive work, 2006-07-25)
We need to change "name" to "string" and "oid" to "util" to make this
conversion, but other than that the APIs are pretty much identical for
what bundle.c made use of.
Let's also replace the memset(..,0,...) pattern with a more idiomatic
"INIT" macro, and finally add a *_release() function so to free the
allocated memory.
Before this the add_to_ref_list() would leak memory, now e.g. "bundle
list-heads" reports no memory leaks at all under valgrind.
In the bundle_header_init() function we're using a clever trick to
memcpy() what we'd get from the corresponding
BUNDLE_HEADER_INIT. There is a concurrent series to make use of that
pattern more generally, see [1].
1. https://lore.kernel.org/git/cover-0.5-00000000000-20210701T104855Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for moving away from accessing the OID and name via the
"oid" and "name" slots in a subsequent commit, change the code that
accesses it to use named variables. This makes the subsequent change
smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak from the prefix_filename() function introduced with
its use in 3b754eedd5 (bundle: use prefix_filename with bundle path,
2017-03-20).
As noted in that commit the leak was intentional as a part of being
sloppy about freeing resources just before we exit, I'm changing this
because I'll be fixing other memory leaks in the bundle API (including
the library version) in subsequent commits. It's easier to reason
about those fixes if valgrind runs cleanly at the end without any
leaks whatsoever.
An earlier version of this change[1] went out of its way to not leak
memory on the die() codepaths here, but doing so will only avoid
reports of potential leaks under heap-only leak trackers such as
valgrind, not the SANITIZE=leak mode.
Avoiding those leaks as well might be useful to enable us to run
cleanly under the likes of valgrind in the future. But for now the
relative verbosity of the resulting code, and the fact that we don't
have some valgrind or SANITIZE=leak mode as part of our CI (it's only
run ad-hoc, see [2]), means we're not worrying about that for now.
1. https://lore.kernel.org/git/87v95vdxrc.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/87czsv2idy.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the TEST_OUTPUT_DIRECTORY is defined, then all test data will be
written in that directory instead of the default directory located in
"t/". While this works as expected for our normal tests, performance
tests fail to locate and aggregate performance data because they don't
know to handle TEST_OUTPUT_DIRECTORY correctly and always look at the
default location.
Fix the issue by adding a `--results-dir` parameter to "aggregate.perl"
which identifies the directory where results are and by making the "run"
script awake of the TEST_OUTPUT_DIRECTORY variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change all in-tree users of the string_list_init(LIST, BOOL) API to
use string_list_init_{nodup,dup}(LIST) instead.
As noted in the preceding commit let's leave the now-unused
string_list_init() wrapper in-place for any in-flight users, it can be
removed at some later date.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to use the new "memcpy() a 'blank' struct on the stack"
pattern for string_list_init(), and to make the macro initialization
consistent with the function initialization introduce two new
string_list_init_{nodup,dup}() functions. These are like the old
string_list_init() when called with a false and true second argument,
respectively.
I think this not only makes things more consistent, but also easier to
read. I often had to lookup what the ", 0)" or ", 1)" in these
invocations meant, now it's right there in the function name, and
corresponds to the macros.
A subsequent commit will convert existing API users to this pattern,
but as this is a very common API let's leave a compatibility function
in place for later removal. This intermediate state also proves that
the compatibility function works.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the dir_init() function and replace it with a DIR_INIT
macro. In many cases in the codebase we need to initialize things with
a function for good reasons, e.g. needing to call another function on
initialization. The "dir_init()" function was not one such case, and
could trivially be replaced with a more idiomatic macro initialization
pattern.
The only place where we made use of its use of memset() was in
dir_clear() itself, which resets the contents of an an existing struct
pointer. Let's use the new "memcpy() a 'blank' struct on the stack"
idiom to do that reset.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the common patter in the codebase of duplicating the
initialization logic between an *_INIT macro and a
corresponding *_init() function to use the macro as the canonical
source of truth.
Now we no longer need to keep the function up-to-date with the macro
version. This implements a suggestion by Jeff King who found that
under -O2 [1] modern compilers will init new version in place without
the extra copy[1]. The performance of a single *_init() won't matter
in most cases, but even if it does we're going to be producing
efficient machine code to perform these operations.
1. https://lore.kernel.org/git/YNyrDxUO1PlGJvCn@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move *_INIT macros I'll use in a subsequent commits to designated
initializers. This isn't required for those follow-up changes, but
since next commits will change things in this area, let's use the
modern pattern over the old one while we're at it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a custom match_pattern_list() function which we use for matching
test names (like "t1234") against glob-like patterns (like "t1???") for
$GIT_SKIP_TESTS, --verbose-only, etc.
Those patterns may have multiple whitespace-separated elements (e.g.,
"t0* t1234 t5?78"). The callers of match_pattern_list thus pass the
strings unquoted, so that the shell does the usual field-splitting into
separate arguments.
But this also means the shell will do the usual globbing for each
argument, which can result in us seeing an expansion based on what's in
the filesystem, rather than the real pattern. For example, if I have the
path "t5000" in the filesystem, and you feed the pattern "t?000", that
_should_ match the string "t0000", but it won't after the shell has
expanded it to "t5000".
This has been a bug ever since that function was introduced. But it
didn't usually trigger since we typically use the function inside the
trash directory, which has a very limited set of files that are unlikely
to match. It became a lot easier to trigger after edc23840b0 (test-lib:
bring $remove_trash out of retirement, 2021-05-10), because now we match
$GIT_SKIP_TESTS before even entering the trash directory. So the t5000
example above can be seen with:
GIT_SKIP_TESTS=t?000 ./t0000-basic.sh
which should skip all tests but doesn't.
We can fix this by using "set -f" to ask the shell not to glob (which is
in POSIX, so should hopefully be portable enough). We only want to do
this in a subshell (to avoid polluting the rest of the script), which
means we need to get the whole string intact into the match_pattern_list
function by quoting it. Arguably this is a good idea anyway, since it
makes it much more obvious that we intend to split, and it's not simply
sloppy scripting.
Diagnosed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was no documentation for the --negotiate-only option added in
9c1e657a8f (fetch: teach independent negotiation (no packfile),
2021-05-04), only documentation for the related push.negotiation
option added in the following commit in 477673d6f3 (send-pack:
support push negotiation, 2021-05-04).
Let's document it, and update the cross-linking I'd added between
--negotiation-tip=* and 'fetch.negotiationAlgorithm' in
526608284a (fetch doc: cross-link two new negotiation options,
2018-08-01).
I think it would be better to say "in common with the remote" here
than "...the server", but the documentation for --negotiation-tip=*
above this talks about "the server", so let's continue doing that in
this related option. See 3390e42adb (fetch-pack: support negotiation
tip whitelist, 2018-07-02) for that documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the early return if we have no remote refs in send_pack()
earlier.
When this was added in 4c353e890c (Warn when send-pack does nothing,
2005-12-04) one of the first things we'd do was to abort, but as of
cfee10a773 (send-pack/receive-pack: allow errors to be reported back
to pusher., 2005-12-25) we've added numerous server_supports()
conditions that are acted on later in the function, that won't be used
if we don't have remote refs.
Then as of 477673d6f3 (send-pack: support push negotiation,
2021-05-04) we started doing even more work on the assumption that we
had some remote refs to feed to --negotiation-tip=* options.
We only hit this condition if we have nothing to push, so we don't
need to consider "push.negotiate" etc. only to do nothing with that
information.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
A/file -> B/file
B/ -> C/
then directory rename detection transitively renames to give us
A/file -> C/file
However, when C/ == A/, note that this gives us
A/file -> A/file.
merge-recursive assumed that any rename D -> E would have D != E. While
that is almost always true, the above is a special case where it is not.
So we cannot do things like delete the rename source, we cannot assume
that a file existing at path E implies a rename/add conflict and we have
to be careful about what stages end up in the output.
This change feels a bit hackish. It took me surprisingly many hours to
find, and given merge-recursive's design causing it to attempt to
enumerate all combinations of edge and corner cases with special code
for each combination, I'm worried there are other similar fixes needed
elsewhere if we can just come up with the right special testcase.
Perhaps an audit would rule it out, but I have not the energy.
merge-recursive deserves to die, and since it is on its way out anyway,
fixing this particular bug narrowly will have to be good enough.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Path conflicts (typically rename path conflicts, e.g.
rename/rename(1to2) or rename/add/delete), and directory/file conflicts
should obviously result in files not being marked as clean in the merge.
We had a codepath where we missed consulting the path_conflict and
df_conflict flags, based on match_mask. Granted, it requires an unusual
setup to trigger this codepath (directory rename causing rename-to-self
is the only case I can think of), but we still need to handle it. To
make it clear that we have audited the other codepaths that do not
explicitly mention these flags, add some assertions that the flags are
not set.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
A/file -> B/file
B/ -> C/
then directory rename detection transitively renames to give us C/file.
Since the default for merge.directoryRenames is conflict, this results
in an error message saying it is unclear whether the file should be
placed at B/file or C/file.
What if C/ is A/, though? In such a case, the transitive rename would
give us A/file, the original name we started with. Logically, having
an error message with B/file vs. A/file should be fine, as should
leaving the file where it started. But the logic in both
merge-recursive and merge-ort did not handle a case of a filename being
renamed to itself correctly; merge-recursive had two bugs, and merge-ort
had one. Add some testcases covering such a scenario.
Based-on-testcase-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git grep allows combining two patterns with --and. It checks and
reports if the second pattern is missing when compiling the expression.
A missing first pattern, however, is only reported later at match time.
Thus no error is returned if no matching is done, e.g. because no file
matches the also given pathspec.
When that happens we get an expression tree with an GREP_NODE_AND node
and a NULL pointer to the missing left child. free_pattern_expr()
tries to dereference it during the cleanup at the end, which results
in a segmentation fault.
Fix this by verifying the presence of the left operand at expression
compilation time.
Reported-by: Matthew Hughes <matthewhughes934@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linux users may benefit from additional information on how to
avoid ENOMEM from mmap despite the system having enough RAM to
accomodate them. We can't reliably unmap pack windows to work
around the issue since malloc and other library routines may
mmap without our knowledge.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests will fail under --verbose because while we've unset COLUMNS
since b1d645b58a (tests: unset COLUMNS inherited from environment,
2012-03-27), we also look for the columns with an ioctl(..,
TIOCGWINSZ, ...) on some platforms. By setting COLUMNS again we
preempt the TIOCGWINSZ lookup in pager.c's term_columns(), it'll take
COLUMNS over TIOCGWINSZ,
This fixes t0500-progress-display.sh., which broke because of a
combination of the this issue and the progress output reacting to the
column width since 545dc345eb (progress: break too long progress bar
lines, 2019-04-12). The t5324-split-commit-graph.sh fails in a similar
manner due to progress output, see [1] for details.
The issue is not specific to progress.c, the diff code also checks
COLUMNS and some of its tests can be made to fail in a similar
manner[2], anything that invokes a pager is potentially affected.
See ea77e675e5 (Make "git help" react to window size correctly,
2005-12-18) and ad6c3739a3 (pager: find out the terminal width before
spawning the pager, 2012-02-12) for how the TIOCGWINSZ code ended up
in pager.c
1. http://lore.kernel.org/git/20210624051253.GG6312@szeder.dev
2. https://lore.kernel.org/git/20210627074419.GH6312@szeder.dev/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the GNU make ".DELETE_ON_ERROR" flag in our main Makefile, as we
already do in the Documentation/Makefile since db10fc6c09 (doc:
simplify Makefile using .DELETE_ON_ERROR, 2021-05-21).
Now if a command to make X fails X will be removed, the default
behavior of GNU make is to only do so if "make" itself is interrupted
with a signal.
E.g. if we now intentionally break one of the rules with:
- mv $@+ $@
+ mv $@+ $@ && \
+ false
We'll get output like:
$ make git
CC git.o
LINK git
make: *** [Makefile:2179: git] Error 1
make: *** Deleting file 'git'
$ file git
git: cannot open `git' (No such file or directory)
Before this change we'd leave the file in place in under this
scenario.
As in db10fc6c09 this allows us to remove patterns of removing
leftover $@ files at the start of rules, since previous failing runs
of the Makefile won't have left those littered around anymore.
I'm not as confident that we should be replacing the "mv $@+ $@"
pattern entirely, since that means that external programs or one of
our other Makefiles might race and get partial content.
I'm not changing $(REMOTE_CURL_ALIASES) since that uses a ln/ln -s/cp
dance, and would require the addition of "-f" flags if the "rm" at the
start was removed. I've also got plans to fix that ln/ln -s/cp pattern
in another series.
For $(LIB_FILE) and $(XDIFF_LIB) we can rely on the "c" (create) being
present in ARFLAGS.
I'm not changing "$(ETAGS_TARGET)", "tags" and "cscope" because
they've got a messy combination of removing "$@+" not "$@" at the
beginning, or "$@*". I'm also addressing those in another series.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git multi-pack-index verify' inspects the data in an existing MIDX for
correctness by checking that the recorded object offsets are correct,
and so on.
But it does not check that the file's trailing checksum matches the data
that it records. So, if an on-disk corruption happened to occur in the
final few bytes (and all other data was recorded correctly), we would:
- get a clean result from 'git multi-pack-index verify', but
- be unable to reuse the existing MIDX when writing a new one (since
we now check for checksum mismatches before reusing a MIDX)
Teach the 'verify' sub-command to recognize corruption in the checksum
by calling midx_checksum_valid().
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new multi-pack index, Git tries to reuse as much of the
data from an existing MIDX as possible, like object offsets. This is
done to avoid re-opening a bunch of *.idx files unnecessarily, but can
lead to problems if the data we are reusing is corrupt.
That's because we'll blindly reuse data from an existing MIDX without
checking its trailing checksum for validity. So if there is memory
corruption while writing a MIDX, or disk corruption in the intervening
period between writing and reuse, we'll blindly propagate those bad
values forward.
Suppose we experience a memory corruption while writing a MIDX such that
we write an incorrect object offset (or alternatively, the disk corrupts
the data after being written, but before being reused). Then when we go
to write a new MIDX, we'll reuse the bad object offset without checking
its validity. This means that the MIDX we just wrote is broken, but its
trailing checksum is in-tact, since we never bothered to look at the
values before writing.
In the above, a "git multi-pack-index verify" would have caught the
problem before writing, but writing a new MIDX wouldn't have noticed
anything wrong, blindly carrying forward the corrupt offset.
Individual pack indexes check their validity by verifying the crc32
attached to each entry when carrying data forward during a repack.
We could solve this problem for MIDXs in the same way, but individual
crc32's don't make much sense, since their entries are so small.
Likewise, checking the whole file on every read may be prohibitively
expensive if a repository has a lot of objects, packs, or both.
But we can check the trailing checksum when reusing an existing MIDX
when writing a new one. And a corrupt MIDX need not stop us from writing
a new one, since we can just avoid reusing the existing one at all and
pretend as if we are writing a new MIDX from scratch.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite an existing caller in `git commit-graph verify` to take
advantage of checksum_valid().
Note that the replacement isn't a verbatim cut-and-paste, since the new
function avoids using hashfile at all and instead talks to the_hash_algo
directly, but it is functionally equivalent.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function which checks the validity of a file's trailing
checksum. This is similar to hashfd_check(), but different since it is
intended to be used by callers who aren't writing the same data (like
`git index-pack --verify`), but who instead want to validate the
integrity of data that they are reading.
Rewrite the first of two callers which could benefit from this new
function in pack-check.c. Subsequent callers will be added in the
following patches.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GitHub Actions to upload/download workflow artifacts saw a major
upgrade since Git's GitHub workflow was established. Let's use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a `.bat` script to copy the DLLs in the `vs-build` job, and those
type of scripts are native to CMD, not to PowerShell.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our continuous builds, Windows is the odd cookie that requires a
complete development environment to be downloaded because there is no
suitable one installed by default on Windows.
Side note: technically, there _is_ a development environment present in
GitHub Actions' build agents: MSYS2. But it differs from Git for
Windows' SDK in subtle points, unfortunately enough so to prevent Git's
test suite from running without failures.
Traditionally, we support downloading this environment (which we
nicknamed `git-sdk-64-minimal`) via a PowerShell scriptlet that accesses
the build artifacts of a dedicated Azure Pipeline (which packages a tiny
subset of the full Git for Windows SDK, containing just enough to build
Git and run its test suite).
This PowerShell script is unfortunately not very robust and sometimes
fails due to network issues.
Of course, we could add code to detect that situation, wait a little,
try again, if it fails again wait a little longer, lather, rinse and
repeat.
Instead of doing all of this in Git's own `.github/workflows/`, though,
let's offload this logic to the new GitHub Action at
https://github.com/marketplace/actions/setup-git-for-windows-sdk
This Action not only downloads and extracts git-sdk-64-minimal _outside_
the worktree (making it no longer necessary to meddle with
`.gitignore` or `.git/info/exclude`), it also adds the `bash.exe` to the
`PATH` and sets the environment variable `MSYSTEM` (an implementation
detail that Git's workflow should never have needed to know about).
This allows us to convert all those funny PowerShell tasks that wanted
to call git-sdk-64-minimal's `bash.exe`: they all are now regular `bash`
scriptlets.
This finally lets us get rid of the funny quoting and escaping where we
had to pay attention not only to quote and escape the Bash scriptlets
properly, but also to add a second level of escaping (with backslashes
for double quotes and backticks for dollar signs) to stop PowerShell
from doing unintended things.
Further, this Action uses a fast caching strategy native to GitHub
Actions that should accelerate the download across CI runs:
git-sdk-64-minimal is usually updated once per 24h, and needs to be
cached only once within that period. Caching it (unfortunately only on
a per-branch basis) speeds up the download step, and makes it much more
robust at the same time by virtue of accessing a cache location that is
closer in the network topology.
With this we can drop the home-rolled caching where we try to accelerate
the test phase by uploading git-sdk-64-minimal as a workflow artifact
after using it to build Git, and then download it as workflow artifact
in the test phase.
Even better: the `vs-test` job no longer needs to depend on the
`windows-build` job. The only reason it depended on it was to ensure
that the `git-sdk-64-minimal` workflow artifact was available.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have two types (a decoration type and an object type) in the
function, let's give them both unique names to avoid accidentally using
one instead of the other.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we load the ref decorations, we parse the object pointed to by each
ref in order to get a "struct object". This is unnecessarily expensive;
we really only need the object struct, and don't even look at the parsed
contents. The exception is tags, which we do need to peel.
We can improve this by looking up the object type first (which is much
cheaper), and skipping the parse entirely for non-tags. This increases
the work slightly for annotated tags (which now do a type lookup _and_ a
parse), but decreases it a lot for other types. On balance, this seems
to be a good tradeoff.
In my git.git clone, with ~2k refs, most of which are branches, the time
to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my
linux.git clone, which contains mostly tags and only a handful of
branches, the time drops from 30ms to 19ms. And on a more extreme
real-world case with ~220k refs, mostly non-tags, the time drops from
2.6s to 650ms.
That command is a lop-sided example, of course, because it does as
little non-loading work as possible. But it does show the absolute time
improvement. Even in something like a full "git log --decorate" on that
extreme repo, we'd still be saving 2s of CPU time.
Ideally we could push this even further, and avoid parsing even tags, by
relying on the packed-refs "peel" optimization (which we could do by
calling peel_iterated_oid() instead of peeling manually). But we can't
do that here. The packed-refs file only stores the bottom-layer of the
peel (so in a "tag->tag->commit" chain, it stores only the commit as the
peel result). But the decoration code wants to peel the layers
individually, annotating the middle layers of the chain.
If the packed-refs file ever learns to store all of the peeled layers,
then we could switch to it. Or even if it stored a flag to indicate the
peel was not multi-layer (because most of them aren't), then we could
use it most of the time and fall back to a manual peel for the rare
cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases it's useful for efficiency reasons to get the type of an
object before deciding whether to parse it, but we still want an object
struct. E.g., in reachable.c, bitmaps give us the type, but we just want
to mark flags on each object. Likewise, we may loop over every object
and only parse tags in order to peel them; checking the type first lets
us avoid parsing the non-tags.
But our lookup_blob(), etc, functions make getting an object struct
annoying: we have to call the right function for every type. And we
cannot just use the generic lookup_object(), because it only returns an
already-seen object; it won't allocate a new object struct.
Let's provide a function that dispatches to the correct lookup_*
function based on a run-time type. In fact, reachable.c already has such
a helper, so we'll just make that public.
I did change the return type from "void *" to "struct object *". While
the former is a clever way to avoid casting inside the function, it's
less safe and less informative to people reading the function
declaration.
The next commit will add a new caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lookup_unknown_object() system is not often used and is somewhat
confusing. Let's try to explain it a bit more (which is especially
important as I'm adding a related but slightly different function in the
next commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If no --decorate option is given, we default to auto-decoration. And
when that kicks in, cmd_log_init_finish() will unconditionally load the
decoration refs.
However, if we are using a user-format that does not include "%d" or
"%D", we won't show the decorations at all, so we don't need to load
them. We can detect this case and auto-disable them by adding a new
field to our userformat_want helper. We can do this even when the user
explicitly asked for --decorate, because it can't affect the output at
all.
This patch consistently reduces the time to run "git log -1 --format=%H"
on my git.git clone (with ~2k refs) from 34ms to 7ms. On a much more
extreme real-world repository (with ~220k refs), it goes from 2.5s to
4ms.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment only mentions "notes", but there are more fields now (and
I'm about to add another). Let's make it more general, and stick the
struct next to the function to make the list of possibilities obvious.
While we're touching this comment, let's also mention the behavior of
NULL, which some callers rely on (though in the long run, this global is
pretty nasty and probably should get moved into rev_info).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over time when parts of submodule have been ported from shell to
builtin, many instances of the submodule helper have been added.
Also added with them are some unnecessary option passing
logic that are based on the `prefix` shell variable which never
gets set in their code flows.
On analysis, the only shell functions which have a valid usage
for the `prefix` shell variable are:
- cmd_update: which is the only function which sets the variable
and thus uses it properly
- cmd_init: which uses the variable via a call from cmd_update
So, remove the unnecessary option parsing logic based on the `prefix`
shell variable.
Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cover blindspots in the testing of stdin handling, including the
"!len" condition added in b5d97e6b0a (pack-objects: run rev-list
equivalent internally., 2006-09-04). The codepath taken with --revs
and read_object_list_from_stdin() acts differently in some of these
common cases, let's test for those.
The "--stdin --revs" test being added here stresses the combination of
--stdin-packs and the revision.c --stdin argument, some of this was
covered in a test added in 339bce27f4 (builtin/pack-objects.c: add
'--stdin-packs' option, 2021-02-22), but let's make sure that
GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS=true keeps erroring out about
--stdin, and it isn't picked up by the revision.c API's handling of
that option.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git for Windows is compiled with a runtime prefix, and that runtime
prefix is typically `C:/Program Files/Git/mingw64`. As we want the
system gitconfig to live in the sibling directory `etc`, we define the
relative path as `../etc/gitconfig`.
However, as reported by Philip Oakley, the output of `git config
--show-origin --system -l` looks rather ugly, as it shows the path as
`file:C:/Program Files/Git/mingw64/../etc/gitconfig`, i.e. with the
`mingw64/../` part.
By normalizing the path, we get a prettier path.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, when Git for Windows is built with CMake, the system Git config is
expected in a different location than when building via `make`: the former
expects it to be in `<runtime-prefix>/mingw64/etc/gitconfig`, the latter in
`<runtime-prefix>/etc/gitconfig`.
Because of this, things like `git clone` do not work correctly (because cURL is
no longer able to find its certificate bundle that it needs to validate HTTPS
certificates). See the full bug report and discussion here:
https://github.com/git-for-windows/git/issues/3071#issuecomment-789261386.
This commit aligns the CMake-based build by mimicking what is already done in
`config.mak.uname`.
This closes https://github.com/git-for-windows/git/issues/3071.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git for Windows' prefix is `/mingw64/` (or `/mingw32/` for 32-bit
versions), therefore the system config is located at the clunky location
`C:\Program Files\Git\mingw64\etc\gitconfig`.
This moves the system config into a more logical location: the `mingw64`
part of `C:\Program Files\Git\mingw64\etc\gitconfig` never made sense,
as it is a mere implementation detail. Let's skip the `mingw64` part and
move this to `C:\Program Files\Git\etc\gitconfig`.
Side note: in the rare (and not recommended) case a user chooses to
install 32-bit Git for Windows on a 64-bit system, the path will of
course be `C:\Program Files (x86)\Git\etc\gitconfig`.
Background: During the Git for Windows v1.x days, the system config was
located at `C:\Program Files (x86)\Git\etc\gitconfig`. With Git for
Windows v2.x, it moved to `C:\Program Files\Git\mingw64\gitconfig` (or
`C:\Program Files (x86)\Git\mingw32\gitconfig`). Rather than fixing it
back then, we tried to introduce a "Windows-wide" config, but that never
caught on.
Likewise, we move the system `gitattributes` into the same directory.
Obviously, we are cautious to do this only for the known install
locations `/mingw64` and `/mingw32`; If anybody wants to override that
while building their version of Git (e.g. via `make prefix=$HOME`), we
leave the default location of the system config and gitattributes alone.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Typofix (chose -> choose) in the documentation of the patch option
under the commit command.
Signed-off-by: Beshr Kayali <me@beshr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We query `TIOCGWINSZ` in Git to determine the correct value for
`COLUMNS`, and then set that environment variable.
If `TIOCGWINSZ` is not available, we fall back to the hard-coded value
80 _and still_ set the environment variable.
On Windows this is a problem. The reason is that Git for
Windows uses a version of `less` that relies on the MSYS2 runtime to
interact with the pseudo terminal (typically inside a MinTTY window,
which is also aware of the MSYS2 runtime). Both MinTTY and `less.exe`
interact with that pseudo terminal via `ioctl()` calls (which the MSYS2
runtime emulates even if there is no such thing on Windows).
Since https://github.com/gwsw/less/commit/bb0ee4e76c2, `less` prefers
the `COLUMNS` variable over asking ncurses itself.
But `git.exe` itself is _not_ aware of the MSYS2 runtime, or for that
matter of that pseudo terminal, and has no way to call `ioctl()` or
`TIOCGWINSZ`.
Therefore, `git.exe` will fall back to hard-coding 80 columns, no matter
what the actual terminal size is.
But `less.exe` is totally able to interact with the MSYS2 runtime and
would not actually require Git's help (which actually makes things
worse here). So let's not override `COLUMNS` on Windows.
Let's just not set `COLUMNS` unless we managed to query the actual value
from the terminal.
This fixes https://github.com/git-for-windows/git/issues/3235
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both in t4258 and in t9001, the code of the tests following shows the
proper name for the configuration variables. So use the correct names
in the test messages as well.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As can be seen in files "Documentation/blame-options.txt" and
"builtin/blame.c", the name of this configuration option is
"blame.markUnblamableLines".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one step towards supporting partial clone submodules.
Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.
Besides that, there is some code that uses the wrapper functions
like has_promisor_remote(). Those will need to be checked to see if they
could support the non-wrapper functions instead (and thus support any
repository, not just the_repository).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule.c has functionality that prepares the environment for running
a subprocess in a new repo. The lazy-fetching code (used in partial
clones) will need this in a subsequent commit, so move it to a more
central location.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
14111fc492 ("git: submodule honor -c credential.* from command line",
2016-03-01) taught Git to pass through the GIT_CONFIG_PARAMETERS
environment variable when invoking a subprocess on behalf of a
submodule. But when d8d77153ea ("config: allow specifying config entries
via envvar pairs", 2021-01-15) introduced support for GIT_CONFIG_COUNT
(and its associated GIT_CONFIG_KEY_? and GIT_CONFIG_VALUE_?), the
subprocess mechanism wasn't updated to also pass through these
variables.
Since they are conceptually the same (d8d77153ea was written to address
a shortcoming of GIT_CONFIG_PARAMETERS), update the submodule subprocess
mechanism to also pass through GIT_CONFIG_COUNT.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.
The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move repository_format_partial_clone, which is currently a global
variable, into struct repository. (Full support for per-repository
partial clone config will be done in a subsequent commit - this is split
into its own commit because of the extent of the changes needed.)
The new repo-specific variable cannot be set in
check_repository_format_gently() (as is currently), because that
function does not know which repo it is operating on (or even whether
the value is important); therefore this responsibility is delegated to
the outermost caller that knows. Of all the outermost callers that know
(found by looking at all functions that call clear_repository_format()),
I looked at those that either read from the main Git directory or write
into a struct repository. These callers have been modified accordingly
(write to the_repository in the former case and write to the given
struct repository in the latter case).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the submit contain binary files, it will throw exception and stop submit when try to append diff line description.
This commit will skip non-text data files when exception UnicodeDecodeError thrown.
The skip will not affect actual submit files in the resulting cl,
the diff line description will only appear in submit template,
so you can review what changed before actully submit to p4.
I don't know if add any message here will be helpful for users,
so I choose to just skip binary content, since it already append filename previously.
Signed-off-by: dorgon.chang <dorgonman@hotmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing tests for --remotes, --list and --merge-base. These are
not exhaustive, but better than the nothing we have now.
There were some tests for this command added in f76412ed6d ([PATCH]
Add 'git show-branch'., 2005-08-21) has never been properly tested,
namely for the --all option in t6432-merge-recursive-space-options.sh,
and some of --merge-base and --independent in t6010-merge-base.sh.
This fixes a few more blind spots, but there's still a lot of behavior
that's not tested for.
These new tests show the odd (and possibly unintentional) behavior of
--merge-base with one argument, and how its output is the same as "git
merge-base" with N bases in this particular case. See the test added
in f621a8454d (git-merge-base/git-show-branch --merge-base:
Documentation and test, 2009-08-05) for a case where the two aren't
the same.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the colored output introduced in ab07ba2a24 (show-branch: color
the commit status signs, 2009-04-22) to not color and reset each
individual space character we use for padding. The intent is to color
just the "!", "+" etc. characters.
This makes the output easier to test, so let's do that now. The test
would be much more verbose without a color/reset for each space
character. Since the coloring cycles through colors we previously had
a "rainbow of space characters".
In theory this breaks things for anyone who's relying on the exact
colored output of show-branch, in practice I'd think anyone parsing it
isn't actively turning on the colored output.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the bad tags we've created in the mktag tests through
fast-export, it will die on the bad object or ref, let's make sure
that happens.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "for-each-ref" for all the mktag tests. This test would have
caught the segfault which was fixed in c685450880 (ref-filter: fix
NULL check for parse object failure, 2021-04-01). Let's make sure we
test that code more exhaustively.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend the mktag tests to pass the created bad tag through update-ref
and fsck.
The reason for passing it through update-ref is to guard against it
having a segfault as for-each-ref did before c685450880 (ref-filter:
fix NULL check for parse object failure, 2021-04-01).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend the mktag tests to pass the tag we've created through both
hash-object --literally and fsck.
This checks that fsck itself will not complain about certain invalid
content if a reachable tip isn't involved. Due to how fsck works and
walks the graph the failure will be different if the object is
reachable, so we might succeed before we've created the ref.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-05)
introduced batching of fetching missing blobs, so that the diff
machinery would have one fetch subprocess grab N blobs instead of N
processes each grabbing 1.
However, the diff machinery is not the only thing in a merge that needs
to work on blobs. The 3-way content merges need them as well. Rather
than download all the blobs 1 at a time, prefetch all the blobs needed
for regular content merges.
This does not cover all possible paths in merge-ort that might need to
download blobs. Others include:
- The blob_unchanged() calls to avoid modify/delete conflicts (when
blob renormalization results in an "unchanged" file)
- Preliminary content merges needed for rename/add and
rename/rename(2to1) style conflicts. (Both of these types of
conflicts can result in nested conflict markers from the need to do
two levels of content merging; the first happens before our new
prefetch_for_content_merges() function.)
The first of these wouldn't be an extreme amount of work to support, and
even the second could be theoretically supported in batching, but all of
these cases seem unusual to me, and this is a minor performance
optimization anyway; in the worst case we only get some of the fetches
batched and have a few additional one-off fetches. So for now, just
handle the regular 3-way content merges in our prefetching.
For the testcase from the previous commit, the number of downloaded
objects remains at 63, but this drops the number of fetches needed from
32 down to 20, a sizeable reduction.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort was designed to minimize the amount of data needed and used,
and several changes were made to diffcore-rename to take advantage of
extra metadata to enable this data minimization (particularly the
relevant_sources variable for skipping "irrelevant" renames). This
effort obviously succeeded in drastically reducing computation times,
but should also theoretically allow partial clones to download much less
information. Previously, though, the "prefetch" command used in
diffcore-rename had never been modified and downloaded many blobs that
were unnecessary for merge-ort. This commit corrects that.
When doing basename comparisons, we want to fetch only the objects that
will be used for basename comparisons. If after basename fetching this
leaves us with no more relevant sources (or no more destinations), then
we won't need to do the full inexact rename detection and can skip
downloading additional source and destination files. Even if we have to
do that later full inexact rename detection, irrelevant sources are
culled after basename matching and before the full inexact rename
detection, so we can still avoid downloading the blobs for irrelevant
sources. Rename prefetch() to inexact_prefetch(), and introduce a
new basename_prefetch() to take advantage of this.
If we modify the testcase from commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2021-01-23)
to pass
--sparse --filter=blob:none
to the clone command, and use the new trace2 "fetch_count" output from
a few commits ago to track both the number of fetch subcommands invoked
and the number of objects fetched across all those fetches, then for
the mega-renames testcase we observe the following:
BEFORE this commit, rebasing 35 patches:
strategy # of fetches total # of objects fetched
--------- ------------ --------------------------
recursive 62 11423
ort 30 11391
AFTER this commit, rebasing the same 35 patches:
ort 32 63
This means that the new code only needs to download less than 2 blobs
per patch being rebased. That is especially interesting given that the
repository at the start only had approximately half a dozen TOTAL blobs
downloaded to start with (because the default sparse-checkout of just
the toplevel directory was in use).
So, for this particular linux kernel testcase that involved ~26,000
renames on the upstream side (drivers/ -> pilots/) across which 35
patches were being rebased, this change reduces the number of blobs that
need to be downloaded by a factor of ~180.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
estimate_similarity() was setting up a diff_populate_filespec_options
every time it was called, requiring the caller of estimate_similarity()
to pass in some data needed to set up this option. Currently the needed
data consisted of a single variable (skip_unmodified), but we want to
also have the different estimate_similarity() callsites start using
different missing_object_cb functions as well. Rather than also passing
that data in, just have the caller pass in the whole
diff_populate_filespec_options, and reduce the number of times we need to
set it up.
As a side note, this also drops the number of calls to
has_promisor_remote() dramatically. If L is the number of basename
paths to compare, M is the number of inexact sources, and N is the
number of inexact destinations, then the number of calls to
has_promisor_remote() drops from L+M*N down to at most 2 -- one for each
of the sites that calls estimate_similarity(). has_promisor_remote() is
a very fast function so this almost certainly has no measurable
performance impact, but it seems cleaner to avoid calling that function
so many times.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei found that the word "shallow" has an extra letter "l" in
"po/zh_CN.po". There are similar typos in other l10n files.
Reported-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Two spaces unaligned to anything is not part of the coding-style. A
single tab is.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to store ran_ff. Now it's obvious from the conditionals.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "git pull --rebase" takes a shortcut in the case a
fast-forward merge is possible; run_merge() is called with --ff-only.
However, "git merge" didn't have an --autostash option, so, when "git
pull --rebase --autostash" was called *and* the fast-forward merge
shortcut was taken, then the pull failed.
This was fixed in commit f15e7cf5cc (pull: ff --rebase --autostash
works in dirty repo, 2017-06-01) by simply skipping the fast-forward
merge shortcut.
Later on "git merge" learned the --autostash option [a03b55530a
(merge: teach --autostash option, 2020-04-07)], and so did "git pull"
[d9f15d37f1 (pull: pass --autostash to merge, 2020-04-07)].
Therefore it's not necessary to skip the fast-forward merge shortcut
anymore when called with --rebase --autostash.
Let's always take the fast-forward merge shortcut by essentially
reverting f15e7cf5cc.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent update to contrib/completion/git-completion.bash causes bash to fail
auto complete custom commands that are wrapped with __git_func_wrap. Declaring
__git_cmd_idx=0 inside __git_func_wrap resolves the issue.
Signed-off-by: Fabian Wermelinger <fabianw@mavt.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Not all systems have a readlink program available for use by the shell.
This causes t3210 to fail on at least AIX. Let's provide a perl
one-liner to do the same thing, and use it there.
I also updated calls in t9802. Nobody reported failure there, but it's
the same issue. Presumably nobody actually tests with p4 on AIX in the
first place (if it is even available there).
I left the use of readlink in the "--valgrind" setup in test-lib.sh, as
valgrind isn't available on exotic platforms anyway (and I didn't want
to increase dependencies between test-lib.sh and test-lib-functions.sh).
There's one other curious case. Commit d2addc3b96 (t7800: readlink may
not be available, 2016-05-31) fixed a similar case. We can't use our
wrapper function there, though, as it's inside a sub-script triggered by
Git. It uses a slightly different technique ("ls" piped to "sed"). I
chose not to use that here as it gives confusing "ls -l" output if the
file is unexpectedly not a symlink (which is OK for its limited use, but
potentially confusing for general use within the test suite). The perl
version emits the empty string.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new function "get_abbrev_oid" to get abbrev object ID. This
function has a default value which helps to prepare a nonempty replace
pattern for sed command. An empty replace pattern may cause sed fail
to allocate memory.
Refactor function "make_user_friendly_and_stable_output" to use
"get_abbrev_oid" to get abbrev object ID.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor function "create_commits_in" to use "test_commit" to create
commit.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before comparing with the expect file, we used to call function
"make_user_friendly_and_stable_output" to filter out trailing spaces in
output. Ævar recommends using pattern "s/Z$//" to prepare expect file,
and then compare it with raw output.
Since we have fixed the issue of occasionally missing the clear-to-eol
suffix when displaying sideband #2 messages, it is safe and stable to
test against raw output.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "demultiplex_sideband()" sees a nonempty message ending with CR or
LF on the sideband #2, it adds "suffix" string to clear to the end of
the current line, which helps when relaying a progress display whose
records are terminated with CRs. But if it sees a single LF, no
clear-to-end suffix should be appended, because this single LF is used
to end the progress display by moving to the next line, and the final
progress display above should be preserved.
However, the code forgot that depending on the length of the payload
line, such a CR may fall exactly at the packet boundary and the
number of bytes before the CR from the beginning of the packet could
be zero. In such a case, the message that was terminated by the CR
were leftover in the "scratch" buffer in the previous call to the
function and we still need to clear to the end of the current line.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ævar reported that the function `make_user_friendly_and_stable_output()`
failed on a i386 box (gcc45) in the gcc farm boxes with error:
sed: couldn't re-allocate memory
It turns out that older versions of bash (4.3) or dash (0.5.7) cannot
evaluate expression like `${A%${A#???????}}` used to get the leading 7
characters of variable A.
Replace the incompatible parameter expansion so that t6020 works on
older version of bash or dash.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These typos were found while searching the codebase for gendered
pronouns. In the case of t9300-fast-import.sh, remove a confusing
comment that is unnecessary to the understanding of the test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid specifying the gender of our users in order to be
more inclusive, but sometimes a few slip by due to habit.
Since by doing a little bit of rewording we can avoid this irrelevant
detail, let's do so.
Inspired-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using gendered pronouns for an anonymous person applies a gender where
none is known and further excludes readers who do not use gendered
pronouns. Avoid such examples in the documentation by using "they" or
passive voice to avoid the need for a pronoun.
Inspired-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a missing test for the behavior of the pre-auto-gc hook added in
0b85d92661 (Documentation/hooks: add pre-auto-gc hook, 2008-04-02).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start by creating an "actual" file in a core.hooksPath test that has
the hook echoing to the "actual" file.
We later test_cmp that file to see what hooks were run. If we fail to
run our hook(s) we'll have an empty list of hooks for the test_cmp
instead of a nonexisting file. For the logic of this test that makes more sense.
See 867ad08a26 (hooks: allow customizing where the hook directory is,
2016-05-04) for the commit that added these tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without the "d", it sounds like a command, not an error, and is liable
to be translated incorrectly.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modernize test code added in ce567d1867 (Add test to show that
show-branch misses out the 8th column, 2008-07-23) and
11ee57bc4c (sort_in_topological_order(): avoid setting a commit flag,
2008-07-23) to use test helpers.
I'm renaming "out" to "actual" for consistency with other tests, and
introducing a "branches.sorted" file in the setup, to make it clear
that it's important that the list be sorted in this particular way.
The "show-branch" output is indented with spaces, which would cause
complaints under "git show --check" with an indented here-doc
block. Let's prefix the lines with "> " to work around that, and to
make it clear that the leading whitespace is important.
We can also get rid of the hardcoding of "main" added here in
334afbc76f (tests: mark tests relying on the current default for
`init.defaultBranch`, 2020-11-18). For this test we're setting up an
"initial" commit anyway, and now that we've moved over to test_commit
we can reference that instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the only *show-branch* test file to indicate that more tests
belong it in than just the one-off octopus test it now contains.
The test was initially added in ce567d1867 (Add test to show that
show-branch misses out the 8th column, 2008-07-23) and
11ee57bc4c (sort_in_topological_order(): avoid setting a commit flag,
2008-07-23). Those two add almost the same content, one with a
test_expect_success and the other a test_expect_failure (a bug being
tested for was fixed on one of the branches).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
report_result() sends a struct to the parent process, but that struct
would contain uninitialised padding bytes. Running this code under MSAN
rightly triggers a warning - but we don't particularly care about this
warning because we control the receiving code, and we therefore know
that those padding bytes won't be read on the receiving end.
We could simply suppress this warning under MSAN with the approporiate
ifdef'd attributes, but a less intrusive solution is to 0-initialise the
struct, which guarantees that the padding will also be initialised.
Interestingly, in the error-case branch, we only try to copy the first
two members of pc_item_result, by copying only PC_ITEM_RESULT_BASE_SIZE
bytes. However PC_ITEM_RESULT_BASE_SIZE is defined as
'offsetof(the_last_member)', which means that we're copying padding bytes
after the end of the second last member. We could avoid doing this by
redefining PC_ITEM_RESULT_BASE_SIZE as
'offsetof(second_last_member) + sizeof(second_last_member)', but there's
no huge benefit to doing so (and this patch silences the MSAN warning in
this scenario either way).
MSAN output from t2080 (partially interleaved due to the
parallel work :) ):
Uninitialized bytes in __interceptor_write at offset 12 inside [0x7fff37d83408, 160)
==23279==WARNING: MemorySanitizer: use-of-uninitialized-value
Uninitialized bytes in __interceptor_write at offset 12 inside [0x7ffdb8a07ec8, 160)
==23280==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0xd5ac28 in xwrite /home/ahunt/git/git/wrapper.c:256:8
#1 0xd5b327 in write_in_full /home/ahunt/git/git/wrapper.c:311:21
#2 0xb0a8c4 in do_packet_write /home/ahunt/git/git/pkt-line.c:221:6
#3 0xb0a5fd in packet_write /home/ahunt/git/git/pkt-line.c:242:6
#4 0x4f7441 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:69:2
#5 0x4f6be6 in worker_loop /home/ahunt/git/git/builtin/checkout--worker.c💯3
#6 0x4f68d3 in cmd_checkout__worker /home/ahunt/git/git/builtin/checkout--worker.c:143:2
#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
#12 0x7f8778114349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120
Uninitialized value was created by an allocation of 'res' in the stack frame of function 'report_result'
#0 0x4f72c0 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:55
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/ahunt/git/git/wrapper.c:256:8 in xwrite
Exiting
#0 0xd5ac28 in xwrite /home/ahunt/git/git/wrapper.c:256:8
#1 0xd5b327 in write_in_full /home/ahunt/git/git/wrapper.c:311:21
#2 0xb0a8c4 in do_packet_write /home/ahunt/git/git/pkt-line.c:221:6
#3 0xb0a5fd in packet_write /home/ahunt/git/git/pkt-line.c:242:6
#4 0x4f7441 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:69:2
#5 0x4f6be6 in worker_loop /home/ahunt/git/git/builtin/checkout--worker.c💯3
#6 0x4f68d3 in cmd_checkout__worker /home/ahunt/git/git/builtin/checkout--worker.c:143:2
#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
#12 0x7f2749a0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120
Uninitialized value was created by an allocation of 'res' in the stack frame of function 'report_result'
#0 0x4f72c0 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:55
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/ahunt/git/git/wrapper.c:256:8 in xwrite
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache_entry contains an object_id, and compare_ce_content() would
include that field when calling memcmp on a subset of the cache_entry.
Depending on which hashing algorithm is being used, only part of
object_id.hash is actually being used, therefore including it in a
memcmp() is incorrect. Instead we choose to exclude the object_id when
calling memcmp(), and call oideq() separately.
This issue was found when running t1700-split-index with MSAN, see MSAN
output below (on my machine, offset 76 corresponds to 4 bytes after the
start of object_id.hash).
Uninitialized bytes in MemcmpInterceptorCommon at offset 76 inside [0x7f60e7c00118, 92)
==27914==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4524ee in memcmp /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:873:10
#1 0xc867ae in compare_ce_content /home/ahunt/git/git/split-index.c:208:8
#2 0xc859fb in prepare_to_write_split_index /home/ahunt/git/git/split-index.c:336:9
#3 0xb4bbca in write_split_index /home/ahunt/git/git/read-cache.c:3107:2
#4 0xb42b4d in write_locked_index /home/ahunt/git/git/read-cache.c:3295:8
#5 0x638058 in try_merge_strategy /home/ahunt/git/git/builtin/merge.c:758:7
#6 0x63057f in cmd_merge /home/ahunt/git/git/builtin/merge.c:1663:9
#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
#12 0x7f60e928e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
#1 0xb4d1e6 in dup_cache_entry /home/ahunt/git/git/read-cache.c:3457:2
#2 0xd214fa in add_entry /home/ahunt/git/git/unpack-trees.c:215:18
#3 0xd1fae0 in keep_entry /home/ahunt/git/git/unpack-trees.c:2276:2
#4 0xd1ff9e in twoway_merge /home/ahunt/git/git/unpack-trees.c:2504:11
#5 0xd27028 in call_unpack_fn /home/ahunt/git/git/unpack-trees.c:593:12
#6 0xd2443d in unpack_nondirectories /home/ahunt/git/git/unpack-trees.c:1106:12
#7 0xd19435 in unpack_callback /home/ahunt/git/git/unpack-trees.c:1306:6
#8 0xd0d7ff in traverse_trees /home/ahunt/git/git/tree-walk.c:532:17
#9 0xd1773a in unpack_trees /home/ahunt/git/git/unpack-trees.c:1683:9
#10 0xdc6370 in checkout /home/ahunt/git/git/merge-ort.c:3590:8
#11 0xdc51c3 in merge_switch_to_result /home/ahunt/git/git/merge-ort.c:3728:7
#12 0xa195a9 in merge_ort_recursive /home/ahunt/git/git/merge-ort-wrappers.c:58:2
#13 0x637fff in try_merge_strategy /home/ahunt/git/git/builtin/merge.c:751:12
#14 0x63057f in cmd_merge /home/ahunt/git/git/builtin/merge.c:1663:9
#15 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
#16 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
#17 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
#18 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
#19 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
Uninitialized value was created by a heap allocation
#0 0x44e73d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/msan_interceptors.cpp:901:3
#1 0xd592f6 in do_xmalloc /home/ahunt/git/git/wrapper.c:41:8
#2 0xd59248 in xmalloc /home/ahunt/git/git/wrapper.c:62:9
#3 0xa17088 in mem_pool_alloc_block /home/ahunt/git/git/mem-pool.c:22:6
#4 0xa16f78 in mem_pool_init /home/ahunt/git/git/mem-pool.c:44:3
#5 0xb481b8 in load_all_cache_entries /home/ahunt/git/git/read-cache.c
#6 0xb44d40 in do_read_index /home/ahunt/git/git/read-cache.c:2298:17
#7 0xb48a1b in read_index_from /home/ahunt/git/git/read-cache.c:2389:8
#8 0xbd5a0b in repo_read_index /home/ahunt/git/git/repository.c:276:8
#9 0xb4bcaf in repo_read_index_unmerged /home/ahunt/git/git/read-cache.c:3326:2
#10 0x62ed26 in cmd_merge /home/ahunt/git/git/builtin/merge.c:1362:6
#11 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
#12 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
#13 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
#14 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
#15 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
#16 0x7f60e928e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:873:10 in memcmp
Exiting
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the mktag --no-strict test to actually test success under
--no-strict, that test was added in 06ce79152b (mktag: add a
--[no-]strict option, 2021-01-06).
It doesn't make sense to check that we have the same failure except
when we want --no-strict, by doing that we're assuming that the
behavior will be different under --no-strict, bun nothing was testing
for that.
We should instead assert that --strict is the same as --no-strict,
except in the cases where we've declared that it's not.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change check_verify_failure() helper to parse out options from
$@. This makes it easier to add new options in the future. See
06ce79152b (mktag: add a --[no-]strict option, 2021-01-06) for the
initial implementation.
Let's also replace "" quotes with '' for the test body, the varables
we need are eval'd into the body, so there's no need for the quoting
confusion.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, both forward and backslash are valid separators. In
22d5507493 (subtree: don't fuss with PATH, 2021-04-27), however, we
added code that assumes that it can only be the forward slash.
Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 22d5507493 (subtree: don't fuss with PATH, 2021-04-27), `git
subtree` was broken thoroughly on Windows.
The reason is that it assumes Unix semantics, where `PATH` is
colon-separated, and it assumes that `$GIT_EXEC_PATH:` is a verbatim
prefix of `$PATH`. Neither are true, the latter in particular because
`GIT_EXEC_PATH` is a Windows-style path, while `PATH` is a Unix-style
path list.
Let's make extra certain that `$GIT_EXEC_PATH` and the first component
of `$PATH` refer to different entities before erroring out.
We do that by using the `test <path1> -ef <path2>` command that verifies
that the inode of `<path1>` and of `<path2>` is the same.
Sadly, this construct is non-portable, according to
https://pubs.opengroup.org/onlinepubs/009695399/utilities/test.html.
However, it does not matter in practice because we still first look
whether `$GIT_EXEC_PREFIX` is string-identical to the first component of
`$PATH`. This will give us the expected result everywhere but in Git for
Windows, and Git for Windows' own Bash _does_ handle the `-ef` operator.
Just in case that we _do_ need to show the error message _and_ are
running in a shell that lacks support for `-ef`, we simply suppress the
error output for that part.
This fixes https://github.com/git-for-windows/git/issues/3260
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an object is already mentioned in a reachability bitmap we are
building, then by definition so are all of the objects it can reach. We
have an optimization to stop traversing commits when we see they are
already in the bitmap, but we don't do the same for trees.
It's generally unavoidable to recurse into trees for commits not yet
covered by bitmaps (since most commits generally do have unique
top-level trees). But they usually have subtrees that are shared with
other commits (i.e., all of the subtrees the commit _didn't_ touch). And
some of those commits (and their trees) may be covered by the bitmap.
Usually this isn't _too_ big a deal, because we'll visit those subtrees
only once in total for the whole walk. But if you have a large number of
unbitmapped commits, and if your tree is big, then you may end up
opening a lot of sub-trees for no good reason.
We can use the same optimization we do for commits here: when we are
about to open a tree, see if it's in the bitmap (either the one we are
building, or the "seen" bitmap which covers the UNINTERESTING side of
the bitmap when doing a set-difference).
This works especially well because we'll visit all commits before
hitting any trees. So even in a history like:
A -- B
if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in
the results from A before looking at B's tree (so we really will only
look at trees touched by B).
For most repositories, the timings produced by p5310 are unspectacular.
Here's linux.git:
Test HEAD^ HEAD
--------------------------------------------------------------------
5310.4: simulated clone 6.00(5.90+0.10) 5.98(5.90+0.08) -0.3%
5310.5: simulated fetch 2.98(5.45+0.18) 2.85(5.31+0.18) -4.4%
5310.7: rev-list (commits) 0.32(0.29+0.03) 0.33(0.30+0.03) +3.1%
5310.8: rev-list (objects) 1.48(1.44+0.03) 1.49(1.44+0.05) +0.7%
Any improvement there is within the noise (the +3.1% on test 7 has to be
noise, since we are not recursing into trees, and thus the new code
isn't even run). The results for git.git are likewise uninteresting.
But here are numbers from some other real-world repositories (that are
not public). This one's tree is comparable in size to linux.git, but has
~16k refs (and so less complete bitmap coverage):
Test HEAD^ HEAD
-------------------------------------------------------------------------
5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5%
5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9%
5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0%
5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8%
And here's another with a very large tree (~340k entries), and a fairly
large number of refs (~10k):
Test HEAD^ HEAD
-------------------------------------------------------------------------
5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1%
5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6%
5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6%
5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5%
This patch provides substantial improvements in these larger cases, and
have any drawbacks for smaller ones (the cost of the bitmap check is
quite small compared to an actual tree traversal).
Note that we have to add a version of revision.c's include_check
callback which handles non-commits. We could possibly consolidate this
into a single callback for all objects types, as there's only one user
of the feature which would need converted (pack-bitmap.c:should_include).
That would in theory let us avoid duplicating any logic. But when I
tried it, the code ended up much worse to read, with lots of repeated
"if it's a commit do this, otherwise do that". Having two separate
callbacks splits that naturally, and matches the existing split of
show_commit/show_object callbacks.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test clean-up.
* ab/test-lib-updates:
test-lib: split up and deprecate test_create_repo()
test-lib: do not show advice about init.defaultBranch under --verbose
test-lib: reformat argument list in test_create_repo()
submodule tests: use symbolic-ref --short to discover branch name
test-lib functions: add --printf option to test_commit
describe tests: convert setup to use test_commit
test-lib functions: add an --annotated option to "test_commit"
test-lib-functions: document test_commit --no-tag
test-lib-functions: reword "test_commit --append" docs
test-lib tests: remove dead GIT_TEST_FRAMEWORK_SELFTEST variable
test-lib: bring $remove_trash out of retirement
The "-m" option in "git log -m" that does not specify which format,
if any, of diff is desired did not have any visible effect; it now
implies some form of diff (by default "--patch") is produced.
* so/log-m-implies-p:
diff-merges: let "-m" imply "-p"
diff-merges: rename "combined_imply_patch" to "merges_imply_patch"
stash list: stop passing "-m" to "git log"
git-svn: stop passing "-m" to "git rev-list"
diff-merges: move specific diff-index "-m" handling to diff-index
t4013: test "git diff-index -m"
t4013: test "git diff-tree -m"
t4013: test "git log -m --stat"
t4013: test "git log -m --raw"
t4013: test that "-m" alone has no effect in "git log"
Optimize out repeated rename detection in a sequence of mergy
operations.
* en/ort-perf-batch-11:
merge-ort, diffcore-rename: employ cached renames when possible
merge-ort: handle interactions of caching and rename/rename(1to1) cases
merge-ort: add helper functions for using cached renames
merge-ort: preserve cached renames for the appropriate side
merge-ort: avoid accidental API mis-use
merge-ort: add code to check for whether cached renames can be reused
merge-ort: populate caches of rename detection results
merge-ort: add data structures for in-memory caching of rename detection
t6429: testcases for remembering renames
fast-rebase: write conflict state to working tree, index, and HEAD
fast-rebase: change assert() to BUG()
Documentation/technical: describe remembering renames optimization
t6423: rename file within directory that other side renamed
"git fetch" over protocol v2 left its side of the socket open after
it finished speaking, which unnecessarily wasted the resource on
the other side.
* jk/fetch-pack-v2-half-close-early:
fetch-pack: signal v2 server that we are done making requests
Use the hashfile API in the codepath that writes the index file to
reduce code duplication.
* ds/write-index-with-hashfile-api:
read-cache: delete unused hashing methods
read-cache: use hashfile instead of git_hash_ctx
csum-file.h: increase hashfile buffer size
hashfile: use write_in_full()
Recent "git clone" left a temporary directory behind when the
transport layer returned an failure.
* jk/clone-clean-upon-transport-error:
clone: clean up directory after transport_fetch_refs() failure
"git send-email" learned the "--sendmail-cmd" command line option
and the "sendemail.sendmailCmd" configuration variable, which is a
more sensible approach than the current way of repurposing the
"smtp-server" that is meant to name the server to instead name the
command to talk to the server.
* ga/send-email-sendmail-cmd:
git-send-email: add option to specify sendmail command
The code to handle the "--format" option in "for-each-ref" and
friends made too many string comparisons on %(atom)s used in the
format string, which has been corrected by converting them into
enum when the format string is parsed.
* zh/ref-filter-atom-type:
ref-filter: introduce enum atom_type
ref-filter: add objectsize to used_atom
Fix typos in documentation, code comments, and RelNotes which repeat
various words. In trivial cases, just delete the duplicated word and
rewrap text, if needed. Reword the affected sentence in
Documentation/RelNotes/1.8.4.txt for it to make sense.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It does not make sense to attempt to set MSGFMT_EXE when NO_GETTEXT is
configured, as such add a check for NO_GETTEXT before attempting to set
it.
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users have expressed interest in a more "batteries included" way of
building via CMake[1], and a big part of that is providing easier access
to tooling external tools.
A straightforward way to accomplish this is to make it as simple as
possible is to enable the generation of the compile_commands.json file,
which is supported by many tools such as: clang-tidy, clang-format,
sourcetrail, etc.
This does come with a small run-time overhead during the configuration
step (~6 seconds on my machine):
Time to configure with CMAKE_EXPORT_COMPILE_COMMANDS=TRUE
real 1m9.840s
user 0m0.031s
sys 0m0.031s
Time to configure with CMAKE_EXPORT_COMPILE_COMMANDS=FALSE
real 1m3.195s
user 0m0.015s
sys 0m0.015s
This seems like a small enough price to pay to make the project more
accessible to newer users. Additionally there are other large projects
like llvm [2] which has had this enabled by default for >6 years at the
time of this writing, and no real negative consequences that I can find
with my search-skills.
NOTE: That the compile_commands.json is currently produced only when
using the Ninja and Makefile generators. See The CMake documentation[3]
for more info.
1: https://lore.kernel.org/git/CAOjrSZusMSvs7AS-ZDsV8aQUgsF2ZA754vSDjgFKMRgi_oZAWw@mail.gmail.com/
2: 2c5712051b
3: https://cmake.org/cmake/help/latest/variable/CMAKE_EXPORT_COMPILE_COMMANDS.html
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building on windows users have the option to use vcpkg to provide
the dependencies needed to compile. Previously, this was used only when
using the Visual Studio generator which was not ideal because:
- Not all users who want to use vcpkg use the Visual Studio
generators.
- Some versions of Visual Studio 2019 moved away from using the
VS 2019 generator by default, making it impossible for Visual
Studio to configure the project in the likely event that it couldn't
find the dependencies.
- Inexperienced users of CMake are very likely to get tripped up by
the errors caused by a lack of vcpkg, making the above bullet point
both annoying and hard to debug.
As such, let's make using vcpkg the default on windows. Users who want
to avoid using vcpkg can disable it by passing -DNO_VCPKG=TRUE.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multimail project is developed independently and has its own project
page. Traditionally, we shipped a copy in contrib/.
However, such a copy is prone to become stale, and users are much better
served to be directed to the actual project instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ibuf can be reused for multiple iterations of the loop. Specifically:
deflate() overwrites s.avail_in to show how much of the input buffer
has not been processed yet - and sometimes leaves 'avail_in > 0', in
which case ibuf will be processed again during the loop's subsequent
iteration.
But if we declare ibuf within the loop, then (in theory) we get a new
(and uninitialised) buffer for every iteration. In practice, my compiler
seems to resue the same buffer - meaning that this code does work - but
it doesn't seem safe to rely on this behaviour. MSAN correctly catches
this issue - as soon as we hit the 's.avail_in > 0' condition, we end up
reading from what seems to be uninitialised memory.
Therefore, we move ibuf out of the loop, making this reuse safe.
See MSAN output from t1050-large below - the interesting part is the
ibuf creation at the end, although there's a lot of indirection before
we reach the read from unitialised memory:
==11294==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f75db58fb1c in crc32_little crc32.c:283:9
#1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
#2 0x7f75db59668c in crc32 crc32.c:242:12
#3 0x8c94f8 in hashwrite csum-file.c:101:15
#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#20 0x421bd9 in _start start.S:120
Uninitialized value was stored to memory at
#0 0x7f75db58fa6b in crc32_little crc32.c:283:9
#1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
#2 0x7f75db59668c in crc32 crc32.c:242:12
#3 0x8c94f8 in hashwrite csum-file.c:101:15
#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5c2011 in flush_pending deflate.c:746:5
#2 0x7f75db5cafa0 in deflate_stored deflate.c:1815:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db644241 in _tr_stored_block trees.c:873:5
#2 0x7f75db5cad7c in deflate_stored deflate.c:1813:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5c8fcf in deflate_stored deflate.c:1783:9
#2 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#3 0xd80b7f in git_deflate zlib.c:244:12
#4 0x825dff in stream_to_pack bulk-checkin.c:140:12
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5ea545 in read_buf deflate.c:1181:5
#2 0x7f75db5c97f7 in deflate_stored deflate.c:1791:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was created by an allocation of 'ibuf' in the stack frame of function 'stream_to_pack'
#0 0x825710 in stream_to_pack bulk-checkin.c:101
SUMMARY: MemorySanitizer: use-of-uninitialized-value crc32.c:283:9 in crc32_little
Exiting
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compiling with -O3, some gcc versions (10.2.1 here) complain about
an out-of-bounds subscript:
revision.c: In function ‘do_add_index_objects_to_pending’:
revision.c:321:22: error: array subscript [1, 2147483647] is outside array bounds of ‘char[1]’ [-Werror=array-bounds]
321 | if (0 < len && name[len] && buf.len)
| ~~~~^~~~~
The "len" parameter here comes from calling interpret_branch_name(),
which intends to return the number of characters of "name" it parsed.
But the compiler doesn't realize this. It knows the size of the empty
string "name" passed in from do_add_index_objects_to_pending(), but it
has no clue that the "len" we get back will be constrained to "0" in
that case.
And I don't think the warning is telling us about some subtle or clever
bug. The implementation of interpret_branch_name() is in another file
entirely, and the compiler can't see it (you can even verify there is no
clever LTO going on by replacing it with "return 0" and still getting
the warning).
We can work around this by replacing our "did we hit the trailing NUL"
subscript dereference with a length check. We do not even have to pay
the cost for an extra strlen(), as we can pass our new length into
interpret_branch_name(), which was converting our "0" into a call to
strlen() anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "path" parameter to ll_union_merge() is named "path_unused", since
we don't ourselves use it. But we do pass it to ll_xdl_merge(), which
may look at it (it gets passed to ll_binary_merge(), which may pass it
to warning()). Let's rename it to correct this inaccuracy (both of the
other functions correctly do not call this "unused").
Note that we also pass drv_unused, but it truly is unused by the rest of
the stack (it only exists at all to provide a generic interface that
matches what ll_ext_merge() needs).
Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cd1d61c44f (make union merge an xdl merge favor, 2010-03-01), we
pass NULL to ll_xdl_merge() for the "name" labels of the ancestor, ours
and theirs buffers. We usually use these for annotating conflict markers
left in a file. For a union merge, these shouldn't matter; the point of
it is that we'd never leave conflict markers in the first place.
But there is one code path where we may dereference them: if the file
contents appear to be binary, ll_binary_merge() will give up and pass
them to warning() to generate a message for the user (that was true even
when cd1d61c44f was written, though the warning was in ll_xdl_merge()
back then).
That can result in a segfault, though on many systems (including glibc),
the printf routines will helpfully just say "(null)" instead. We can
extend our binary-union test in t6406 to check stderr, which catches the
problem on all systems.
This also fixes a warning from "gcc -O3". Unlike lower optimization
levels, it inlines enough to see that the NULL can make it to warning()
and complains:
In function ‘ll_binary_merge’,
inlined from ‘ll_xdl_merge’ at ll-merge.c:115:10,
inlined from ‘ll_union_merge’ at ll-merge.c:151:9:
ll-merge.c:74:4: warning: ‘%s’ directive argument is null [-Wformat-overflow=]
74 | warning("Cannot merge binary files: %s (%s vs. %s)",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
75 | path, name1, name2);
| ~~~~~~~~~~~~~~~~~~~
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prior to commit a944af1d86 (merge: teach -Xours/-Xtheirs to binary
ll-merge driver, 2012-09-08), we always reported a conflict from
ll_binary_merge() by returning "1" (in the xdl_merge and ll_merge code,
this value is the number of conflict hunks). After that commit, we
report zero conflicts if the "variant" flag is set, under the assumption
that it is one of XDL_MERGE_FAVOR_OURS or XDL_MERGE_FAVOR_THEIRS.
But this gets confused by XDL_MERGE_FAVOR_UNION. We do not know how to
do a binary union merge, but erroneously report no conflicts anyway (and
just blindly use the "ours" content as the result).
Let's tighten our check to just the cases that a944af1d86 meant to
cover. This fixes the union case (which existed already back when that
commit was made), as well as future-proofing us against any other
variants that get added later.
Note that you can't trigger this from "git merge-file --union", as that
bails on binary files before even calling into the ll-merge machinery.
The test here uses the "union" merge attribute, which does erroneously
report a successful merge.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of "fast-forward" in the glossary has been updated.
* ry/clarify-fast-forward-in-glossary:
docs: improve fast-forward in glossary content
The parallel checkout codepath did not initialize object ID field
used to talk to the worker processes in a futureproof way.
* mt/parallel-checkout-with-padded-oidcpy:
parallel-checkout: send the new object_id algo field to the workers
We historically rejected a very short string as an author name
while accepting a patch e-mail, which has been loosened.
* ef/mailinfo-short-name:
mailinfo: don't discard names under 3 characters
Add some notes in the code about invariants with match_mask when adding
pairs. Also add a comment that seems to have been left out in my work
of pushing these changes upstream.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A random hodge-podge of incorrect or out-of-date comments that I found:
* t6423 had a comment that has referred to the wrong test for years;
fix it to refer to the right one.
* diffcore-rename had a FIXME comment meant to remind myself to
investigate if I could make another code change. I later
investigated and removed the FIXME, but while cherry-picking the
patch to submit upstream I missed the later update. Remove the
comment now.
* merge-ort had the early part of a comment for a function; I had
meant to include the more involved description when I updated the
function. Update the comment now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The keys of break_idx are strings from the diff_filepairs of
diff_queued_diff. break_idx is only used in location_rename_dst(), and
that usage is always before any free'ing of the pairs (and thus the
strings in the pairs). As such, there is no need to strdup these keys;
we can just reuse the existing strings as-is.
The merge logic doesn't make use of break detection, so this does not
affect the performance of any of my testcases. It was just a minor
unrelated optimization noted in passing while looking at the code.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gathering accumulated times from trace2 output on the mega-renames
testcase, I saw the following timings (where I'm only showing a few
lines to highlight the portions of interest):
10.120 : label:incore_nonrecursive
4.462 : ..label:process_entries
3.143 : ....label:process_entries setup
2.988 : ......label:plist special sort
1.305 : ....label:processing
2.604 : ..label:collect_merge_info
2.018 : ..label:merge_start
1.018 : ..label:renames
In the above output, note that the 4.462 seconds for process_entries was
split as 3.143 seconds for "process_entries setup" and 1.305 seconds for
"processing" (and a little time for other stuff removed from the
highlight). Most of the "process_entries setup" time was spent on
"plist special sort" which corresponds to the following code:
trace2_region_enter("merge", "plist special sort", opt->repo);
plist.cmp = string_list_df_name_compare;
string_list_sort(&plist);
trace2_region_leave("merge", "plist special sort", opt->repo);
In other words, in a merge strategy that would be invoked by passing
"-sort" to either rebase or merge, sorting an array takes more time than
anything else. Serves me right for naming my merge strategy this way.
Rewrite the comparison function in a way that does not require finding
out the lengths of the strings when comparing them. While at it, tweak
the code for our specific case -- no need to handle a variety of modes,
for example. The combination of these changes reduced the time spent in
"plist special sort" by ~25% in the mega-renames case.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.622 s ± 0.059 s 5.235 s ± 0.042 s
mega-renames: 10.127 s ± 0.073 s 9.419 s ± 0.107 s
just-one-mega: 500.3 ms ± 3.8 ms 480.1 ms ± 3.9 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Existing checks for scissors characters using memcmp(3) never read past
the end of the line, because all substrings we are interested in are two
characters long, and the outer loop guarantees we have at least one
character. So at most we will look at the NUL.
However, this is too subtle and may lead to bugs in code which copies
this behavior without realizing substring length requirement. So use
starts_with() instead, which will stop at NUL regardless of the length
of the prefix. Remove extra pair of parentheses while we are here.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change various cmd_* functions that claim to return an "int" to use
"return" instead of exit() to indicate an exit code. These were not
marked with NORETURN, and by directly exit()-ing we'll skip the
cleanup git.c would otherwise do (e.g. closing fd's, erroring if we
can't). See run_builtin() in git.c.
In the case of shell.c and sh-i18n--envsubst.c this was the result of
an incomplete migration to using a cmd_main() in 3f2e2297b9 (add an
extra level of indirection to main(), 2016-07-01).
This was spotted by SunCC 12.5 on Solaris 10 (gcc210 on the gccfarm).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option is almost never a good idea, as the resulting repository is
larger and slower (see the new explanations in the docs).
I outlined the potential problems. We could go further and make the
option harder to find (or at least, make the command-line option
descriptions a much more terse "you probably don't want this; see
pack.packsizeLimit for details"). But this seems like a minimal change
that may prevent people from thinking it's more useful than it is.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some test-cases, UTF-8 locale is required. To find such locale,
we're using the first available UTF-8 locale that returned by
"locale -a".
However, the locale(1) utility is unavailable on some systems,
e.g. Linux with musl libc.
However, without "locale -a", we can't guess provided UTF-8 locale.
Add a Makefile knob GIT_TEST_UTF8_LOCALE and activate it for
linux-musl in our CI system.
Rename t/lib-git-svn.sh:prepare_a_utf8_locale to prepare_utf8_locale,
since we no longer prepare the variable named "a_utf8_locale",
but set up a fallback value for GIT_TEST_UTF8_LOCALE instead.
The fallback will be LC_ALL, LANG environment variable,
or the first UTF-8 locale from output of "locale -a", in that order.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a01f7f2ba0 (merge: enable defaulttoupstream by default,
2014-04-20) forgot to mention the new default in the configuration
documentation.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There're two different default options for log --decorate:
* Should `--decorate` be given without any arguments, it's default to
`short`
* Should neither `--decorate` nor `--no-decorate` be given, it's default
to the `log.decorate` or `auto`.
We documented the former, but not the latter.
Let's document them, too.
Reported-by: Andy AO <zen96285@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The native kdiff3 mergetool is not found by git mergetool on
Windows. The message "The merge tool kdiff3 is not available as
'kdiff3'" is displayed.
Just like we translate the name of the binary and look for it on the
search path for WinMerge, do the same for kdiff3 to find it.
Signed-off-by: Michael Schindler michael@compressconsult.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xdl_bug() function was introduced in
e8adf23d1e (xdl_change_compact(): introduce the concept of a change
group, 2016-08-22), let's use our usual BUG() function instead.
We'll now have meaningful line numbers if we encounter bugs in xdiff,
and less code duplication.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't guard the calls to the progress.c API with "if (progress)". The
API itself will check this. This doesn't change any behavior, but
makes this code consistent with the rest of the codebase.
See ae9af12287 (status: show progress bar if refreshing the index
takes too long, 2018-09-15) for the commit that added the pattern
we're changing here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a trailing newline to the protocol-caps.h file added in the recent
a2ba162cda (object-info: support for retrieving object info,
2021-04-20). Various editors add this implicitly, and some compilers
warn about the lack of a \n here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing spaces before '&&' and switch tabs around '&&' to spaces.
These issues were found using `git grep '[^ ]&&$'` and
`git grep -P '&&\t'`.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dash bug https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097
lets the shell erroneously perform field splitting on the expansion of a
command substitution during declaration of a local variable. It causes
the parallel-checkout tests to fail e.g. when running them with
/bin/dash on MacOS 11.4, where they error out like this:
./t2080-parallel-checkout-basics.sh: 33: local: 0: bad variable name
That's because the output of wc -l contains leading spaces and the
returned number of lines is treated as another variable to declare, i.e.
as in "local workers= 0".
Work around it by enclosing the command substitution in quotes.
Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some platforms, like NonStop do not automatically restart fsync()
when interrupted by a signal, even when that signal is setup with
SA_RESTART.
This can lead to test breakage, e.g., where "--progress" is used,
thus SIGALRM is sent often, and can interrupt an fsync() syscall.
Make sure we deal with such a case by retrying the syscall
ourselves. Luckily, we call fsync() fron a single wrapper,
fsync_or_die(), so the fix is fairly isolated.
Reported-by: Randall S. Becker <randall.becker@nexbridge.ca>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
[jc: the above two did most of the work---I just tied the loose end]
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 87db61a (trace2: write discard message to sentinel files,
2019-10-04), we added a new "too_many_files" event for when trace2
logging would create too many files in an output directory.
Unfortunately, the api-trace2 doc described a "discard" event instead.
Fix the doc to use the correct event name.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-repack doc clearly states that it *does* operate on promisor
packfiles (in a separate partition), with "-a" specified. Presumably
the statements here are outdated, as they feature from the first doc
in 2017 (and the repack support was added in 2018)
Signed-off-by: Tao Klerks <tao@klerks.biz>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two "if (opt->all_objects)" blocks next
to each other, merge them into one to provide better
readability.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --batch code to print an object assumes we found out the type of
the object from calling oid_object_info_extended(). This is true for
the default format, but even in a custom format, we manually modify
the object_info struct to ask for the type.
This assumption was broken by 845de33a5b (cat-file: avoid noop calls
to sha1_object_info_extended, 2016-05-18). That commit skips the call
to oid_object_info_extended() entirely when --batch-all-objects is in
use, and the custom format does not include any placeholders that
require calling it.
Or when the custom format only include placeholders like %(objectname) or
%(rest), oid_object_info_extended() will not get the type of the object.
This results in an error when we try to confirm that the type didn't
change:
$ git cat-file --batch=batman --batch-all-objects
batman
fatal: object 000023961a changed type!?
and also has other subtle effects (e.g., we'd fail to stream a blob,
since we don't realize it's a blob in the first place).
We can fix this by flipping the order of the setup. The check for "do
we need to get the object info" must come _after_ we've decided
whether we need to look up the type.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent change to make git-completion.bash use $__git_cmd_idx
in more places broke a number of completions on zsh because it
modified __git_main but did not update __git_zsh_main.
Notably, completions for "add", "branch", "mv" and "push" were
broken as a result of this change.
In addition to the undefined variable usage, "git mv <tab>" also
prints the following error:
__git_count_arguments:7: bad math expression:
operand expected at `"1"'
_git_mv:[:7: unknown condition: -gt
Remove the quotes around $__git_cmd_idx in __git_count_arguments
and set __git_cmd_idx=1 early in __git_zsh_main to fix the
regressions from 59d85a2a05.
This was tested on zsh 5.7.1 (x86_64-apple-darwin19.0).
Suggested-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Acked-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Contributor for a new language must complete translations of a small set
of l10n messages.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Document the PO helper program (git-po-helper) with installation and
basic usage.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
All we need to know is that their names are the same.
Additionally this might be easier to parse for some since
remote_for_branch is more descriptive than remote_get(NULL).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a single line that is used in a single place, and the variable has
the same name as the function.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If fetch_remote is NULL (i.e. the branch remote is invalid), then it
can't possibly be same as remote, which can't be NULL.
The check is redundant, and so is the extra variable.
Also, fix the Yoda condition: we want to check if remote is the same as
the branch remote, not the other way around.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only override dst on the odd case.
This allows a preemptive break on the `simple` case.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Their code is much simpler now and can move into the parent function.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the setup_push_* functions are appending a refspec. Do this only
once on the parent function.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want all the cases that don't do anything with a branch first, and
then the rest. That way we will be able to get the branch and die if
there's a problem in the parent function, instead of inside the function
of each mode.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to break when nothing else will be executed.
Will help next patches.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code is duplicated among multiple functions.
No functional changes.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the code has been simplified and it's clear what it's
actually doing, update the documentation to reflect that.
Namely; the simple mode only barfs when working on a centralized
workflow, and there's no configured upstream branch with the same name.
Cc: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a safety check to make sure branch->refname isn't different
from branch->merge[0]->src, otherwise we die().
Therefore we always push to branch->refname.
Suggestions-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simply move the code around and remove dead code. In particular the
'!same_remote' conditional is a no-op since that part of the code is the
same_remote leg of the conditional beforehand.
No functional changes.
Suggestions-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to avoid doing unnecessary things and simplify it in further
patches. In particular moving the additional name safety out of
setup_push_upstream() and into setup_push_simple() and thus making both
more straightforward.
The code is copied exactly as-is; no functional changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`simple` is the most important mode so move the relevant code to its own
function to make it easier to see what it's doing.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The typical case is what git was designed for: distributed remotes.
It's only the atypical case--fetching and pushing to the same
remote--that we need to keep an eye on.
No functional changes.
Liked-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Packing refs (and therefore checking that certain refs are not packed)
is a property of the packed/loose ref storage. Add a comment to explain
what the test checks.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable, hashes are correctly formed by design.
Split off test for git-log in empty repo.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given that git-maintenance simply calls out git-pack-refs, it seems superfluous
to test the functionality of pack-refs itself, as that is covered by
t3210-pack-refs.sh.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packed/loose ref storage is an overlay combination of packed-refs (refs and
tags in a single file) and one-file-per-ref. This creates all kinds of edge
cases related to directory/file conflicts, (non-)empty directories, and the
locking scheme, none of which applies to reftable.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable, there is no notion of a per-ref 'existence' of a reflog. Each
reflog entry has its own key, so it is not possible to distinguish between
{reflog doesn't exist,reflog exists but is empty}. This makes the logic
in log_ref_setup() (file refs/files-backend.c), which depends on the existence
of the reflog file infeasible.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test checks what happens if reflog and ref database disagree on the state of
the latest commit. This seems to require accessing reflog storage directly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add extensive comment why this test needs a REFFILES annotation.
I tried forcing universal reflog creation with core.logAllRefUpdates=true, but
that apparently also doesn't cause reflogs to be created for pseudorefs
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
REFFILES can be used to mark tests that are specific to the packed/loose ref
storage format and its limitations. Marking such tests is a preparation for
introducing the reftable storage backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test checks that unreachable objects are really removed. For the test to
work, it has to ensure that no reflog retain any reachable objects.
Previously, it did this by manipulating the file system to remove reflog in the
first test, and relying on git not updating the reflog if the relevant logfile
doesn't exist in follow-up tests.
Now, explicitly clear the reflog using 'reflog expire'. This reduces the
dependency between test functions. It also is more amenable to use with
reftable, which has no concept of (non)-existence of a reflog
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the test independent of the particulars of the storage formats.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use symbolic-ref and rev-parse to inspect refs.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will print $ZERO_OID when asking for a non-existent ref from the
test-helper.
Since resolve-ref provides direct access to refs_resolve_ref_unsafe(), it
provides a reliable mechanism for accessing REFNAME, while avoiding the implicit
resolution to refs/heads/REFNAME.
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reftable will prohibit invalid hashes at the storage level, but
git-symbolic-ref can still create branches ending in ".lock".
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a warning on AIX's xlc compiler that's been emitted since my
a1aad71601 (fsck.h: use "enum object_type" instead of "int",
2021-03-28):
"builtin/fsck.c", line 805.32: 1506-068 (W) Operation between
types "int(*)(struct object*,enum object_type,void*,struct
fsck_options*)" and "int(*)(struct object*,int,void*,struct
fsck_options*)" is not allowed.
I.e. it complains about us assigning a function with a prototype "int"
where we're expecting "enum object_type".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This flag was introduced in 2477ab2e (diff: support anchoring line(s),
2017-11-27) but back then, the bash completion script did not learn
about the new flag. Add it.
Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize the startup time of git-send-email by using an amended
config_regexp() function to retrieve the list of config keys and
values we're interested in.
For boolean keys we can handle the [true|false] case ourselves, and
the "--get" case didn't need any parsing. Let's leave "--path" and
other "--bool" cases to "git config". I'm not bothering with the
"undef" or "" case (true and false, respectively), let's just punt on
those and others and have "git config --type=bool" handle it.
The "grep { defined } @values" here covers a rather subtle case. For
list values such as sendemail.to it is possible as with any other
config key to provide a plain "-c sendemail.to", i.e. to set the key
as a boolean true. In that case the Git::config() API will return an
empty string, but this new parser will correctly return "undef".
However, that means we can end up with "undef" in the middle of a
list. E.g. for sendemail.smtpserveroption in conjuction with
sendemail.smtpserver as a path this would have produce a warning. For
most of the other keys we'd behave the same despite the subtle change
in the value, e.g. sendemail.to would behave the same because
Mail::Address->parse() happens to return an empty list if fed
"undef". For the boolean values we were already prepared to handle
these variables being initialized as undef anyway.
This brings the runtime of "git send-email" from ~60-~70ms to a very
steady ~40ms on my test box. We now run just one "git config"
invocation on startup instead of 8, the exact number will differ based
on the local sendemail.* config. I happen to have 8 of those set.
This brings the runtime of t9001-send-email.sh from ~13s down to ~12s
for me. The change there is less impressive as many of those tests set
various config values, and we're also getting to the point of
diminishing returns for optimizing "git send-email" itself.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been pointed out[1] that cwd() invokes "pwd(1)" while getcwd()
is a Perl-native XS function. For what we're using these for we can
use getcwd().
The performance difference is miniscule, we're saving on the order of
a millisecond or so, see [2] below for the benchmark. I don't think
this matters in practice for optimizing git-send-email or perl
execution (unlike the patches leading up to this one).
But let's do it regardless of that, if only so we don't have to think
about this as a low-hanging fruit anymore.
1. https://lore.kernel.org/git/20210512180517.GA11354@dcvr/
2.
$ perl -MBenchmark=:all -MCwd -wE 'cmpthese(10000, { getcwd => sub { getcwd }, cwd => sub { cwd }, pwd => sub { system "pwd >/dev/null" }})'
(warning: too few iterations for a reliable count)
Rate pwd cwd getcwd
pwd 982/s -- -48% -100%
cwd 1890/s 92% -- -100%
getcwd 10000000000000000000/s 1018000000000000000% 529000000000000064% -
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of unconditionally requiring modules such as File::Spec, let's
only load them when needed. This speeds up code that only needs a
subset of the features Git.pm provides.
This brings a plain invocation of "git send-email" down from 52/37
loaded modules under NO_GETTEXT=[|Y] to 39/18, and it now takes
~60-~70ms instead of ~80-~90ms. The runtime of t9001-send-email.sh
test is down to ~13s from ~15s.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize the time git-send-email takes to do even the simplest of
things (such as serving up "-h") from around ~150ms to ~80ms-~90ms by
lazily loading the modules it requires.
Before this change Devel::TraceUse would report 99/97 used modules
under NO_GETTEXT=[|Y], respectively. Now it's 52/37. It now takes ~15s
to run t9001-send-email.sh, down from ~20s.
Changing File::Spec::Functions::{catdir,catfile} to invoking class
methods on File::Spec itself is idiomatic. See [1] for a more
elaborate explanation, the resulting code behaves the same way, just
without the now-pointless function wrapper.
1. http://lore.kernel.org/git/8735u8mmj9.fsf@evledraar.gmail.com
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change indirect object syntax such as "new X ARGS" to
"X->new(ARGS)". This allows perl to see what "new" is at compile-time
without having loaded Term::ReadLine. This doesn't matter now, but
will in a subsequent commit when we start lazily loading it.
Let's do the same for the adjacent "FakeTerm" package for consistency,
even though we're not going to conditionally load it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change calls like "__ 'foo'" to "__('foo')" so the Perl compiler
doesn't have to guess that "__" is a function. This makes the code
more readable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize git-send-email by only shelling out to "git var" if we need
to. This is easily done by re-inventing our own small version of
perl's Memoize module.
I suppose I could just use Memoize itself, but in a subsequent patch
I'll be micro-optimizing send-email's use of dependencies. Using
Memoize is a measly extra 5-10 milliseconds, but as we'll see that'll
end up mattering for us in the end.
This brings the runtime of a plain "send-email" from around ~160-170ms
to ~140m-150ms. The runtime of the tests is around the same, or around
~20s.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce the time it takes git-send-email to get to even the most
trivial of tasks (such as serving up its "-h" output) by first listing
config keys that exist, and only then only call e.g. "git config
--bool" on them if they do.
Over a lot of runs this speeds the time to "-h" up for me from ~250ms
to ~150ms, and the runtime of t9001-send-email.sh goes from ~25s to
~20s.
This introduces a race condition where we'll do the "wrong" thing if a
config key were to be inserted between us discovering the list and
calling read_config(), i.e. we won't know about the racily added
key. In theory this is a change in behavior, in practice it doesn't
matter.
The config_regexp() function being changed here was added in
dd84e528a3 (git-send-email: die if sendmail.* config is set,
2020-07-23) for use by git-send-email. So we can change its odd return
value in the case where no values are found by "git config". The
difference in the *.pm code would matter if it was invoked in scalar
context, but now it no longer is.
Arguably this caching belongs in Git.pm itself, but in lieu of
modifying it for all its callers let's only do this for "git
send-email". The other big potential win would be "git svn", but
unlike "git send-email" it doesn't check tens of config variables one
at a time at startup (in my brief testing it doesn't check any).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config_regexp() function was added in dd84e528a3 (git-send-email:
die if sendmail.* config is set, 2020-07-23) for use in
git-send-email, and it's the only in-tree user of it.
However, the consensus is that Git.pm is a public interface, so even
though it's a recently added function we can't change it. So let's
copy over a minimal version of it to git-send-email.perl itself. In a
subsequent commit it'll be changed further for our own use.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the removal of the support for sendemail.smtpssl in the preceding
commit the parsing of sendemail.smtpencryption is no longer special,
and can by moved to %config_settings.
This gets us rid of an unconditional call to Git::config(), which as
we'll see in subsequent commits matters for startup performance.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the already dead code to support "sendemail.smtpssl" by finally
removing the dead code supporting the configuration option.
In f6bebd121a (git-send-email: add support for TLS via
Net::SMTP::SSL, 2008-06-25) the --smtp-ssl command-line option was
documented as deprecated, later in 65180c6618 (List send-email config
options in config.txt., 2009-07-22) the "sendemail.smtpssl"
configuration option was also documented as such.
Then in in 3ff15040e2 (send-email: fix regression in
sendemail.identity parsing, 2019-05-17) I unintentionally removed
support for it by introducing a bug in read_config().
As can be seen from the diff context we've already returned unless
$enc i defined, so it's not possible for us to reach the "elsif"
branch here. This code was therefore already dead since Git v2.23.0.
So let's just remove it. We were already 11 years into a stated
deprecation period of this variable when 3ff15040e2 landed, now it's
around 13. Since it hasn't worked anyway for around 2 years it looks
like we can safely remove it.
The --smtp-ssl option is still deprecated, if someone cares they can
follow-up and remove that too, but unlike the config option that one
could still be in use in the wild. I'm just removing this code that's
provably unused already.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git.pm code does its own Perl-ifying of boolean variables, let's
ensure that empty values = true for boolean variables, as in the C
code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for the "GIT_TEST_PERL_FATAL_WARNINGS=true" test mode to
"send-email". This was added to e.g. git-svn in 5338ed2b26 (perl:
check for perl warnings while running tests, 2020-10-21), but not
"send-email". Let's rectify that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix-up to a topic that is already in 'master'.
* en/dir-traversal:
dir: introduce readdir_skip_dot_and_dotdot() helper
dir: update stale description of treat_directory()
Revert "dir: update stale description of treat_directory()"
Revert "dir: introduce readdir_skip_dot_and_dotdot() helper"
Many places in the code were doing
while ((d = readdir(dir)) != NULL) {
if (is_dot_or_dotdot(d->d_name))
continue;
...process d...
}
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
...process d...
}
This helper particularly simplifies checks for empty directories.
Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms. (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.
Update the comments for treat_directory() to use these flag names rather
than the old member names.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"ld" on Solaris fails to link some test helpers, which has been
worked around by reshuffling the inline function definitions from a
header file to a source file that is the only user of them.
* ab/pack-linkage-fix:
pack-objects: move static inline from a header to the sole consumer
Workaround flaky tests introduced recently.
* ds/t1092-fix-flake-from-progress:
t1092: revert the "-1" hack for emulating "no progress meter"
t1092: use GIT_PROGRESS_DELAY for consistent results
Move the code that is only used in builtin/pack-objects.c out of
pack-objects.h.
This fixes an issue where Solaris's SunCC hasn't been able to compile
git since 483fa7f42d (t/helper/test-bitmap.c: initial commit,
2021-03-31).
The real origin of that issue is that in 898eba5e63 (pack-objects:
refer to delta objects by index instead of pointer, 2018-04-14)
utility functions only needed by builtin/pack-objects.c were added to
pack-objects.h. Since then the header has been used in a few other
places, but 483fa7f42d was the first time it was used by test helper.
Since Solaris is stricter about linking and the oe_get_size_slow()
function lives in builtin/pack-objects.c the build started failing
with:
Undefined first referenced
symbol in file
oe_get_size_slow t/helper/test-bitmap.o
ld: fatal: symbol referencing errors. No output written to t/helper/test-tool
On other platforms this is presumably OK because the compiler and/or
linker detects that the "static inline" functions that reference
oe_get_size_slow() aren't used.
Let's solve this by moving the relevant code from pack-objects.h to
builtin/pack-objects.c. This is almost entirely a code-only move, but
because of the early macro definitions in that file referencing some
of these inline functions we need to move the definition of "static
struct packing_data to_pack" earlier, and declare these inline
functions above the macros.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t2080 makes a few copies of a test repository and later performs a
branch switch on each one of the copies to verify that parallel checkout
and sequential checkout produce the same results. However, the
repository is copied with `cp -R` which, on some systems, defaults to
following symlinks on the directory hierarchy and copying their target
files instead of copying the symlinks themselves. AIX is one example of
system where this happens. Because the symlinks are not preserved, the
copied repositories have paths that do not match what is in the index,
causing git to abort the checkout operation that we want to test. This
makes the test fail on these systems.
Fix this by copying the repository with the POSIX flag '-P', which
forces cp to copy the symlinks instead of following them. Note that we
already use this flag for other cp invocations in our test suite (see
t7001). With this change, t2080 now passes on AIX.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the newly added "hooks_path" API in Git.pm to its only user in
git-send-email.perl. This was added in c8243933c7 (git-send-email:
Respect core.hooksPath setting, 2021-03-23), meaning that it hasn't
yet made it into a non-rc release of git.
The consensus with Git.pm is that we need to be considerate of
out-of-tree users who treat it as a public documented interface. We
should therefore be less willing to add new functionality to it, least
we be stuck supporting it after our own uses for it disappear.
In this case the git-send-email.perl hook invocation will probably be
replaced by a future "git hook run" command, and in the commit
preceding this one the "hooks_path" become nothing but a trivial
wrapper for "rev-parse --git-path hooks" anyway (with no
Cwd::abs_path() call), so let's just inline this command in
git-send-email.perl itself.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c8243933c7 (git-send-email: Respect core.hooksPath setting,
2021-03-23) we started supporting core.hooksPath in "send-email". It's
been reported that on Windows[1] doing this by calling abs_path()
results in different canonicalizations of the absolute path.
This wasn't an issue in c8243933c7 itself, but was revealed by my
ea7811b37e (git-send-email: improve --validate error output,
2021-04-06) when we started emitting the path to the hook, which was
previously only internal to git-send-email.perl.
The just-landed 53753a37d0 (t9001-send-email.sh: fix expected
absolute paths on Windows, 2021-05-24) narrowly fixed this issue, but
I believe we can do better here. We should not be relying on whatever
changes Perl's abs_path() makes to the path "rev-parse --git-path
hooks" hands to us. Let's instead trust it, and hand it to Perl's
system() in git-send-email.perl. It will handle either a relative or
absolute path.
So let's revert most of 53753a37d0 and just have "hooks_path" return
what we get from "rev-parse" directly without modification. This has
the added benefit of making the error message friendlier in the common
case, we'll no longer print an absolute path for repository-local hook
errors.
1. http://lore.kernel.org/git/bb30fe2b-cd75-4782-24a6-08bb002a0367@kdbg.org
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This looked like a good idea, but it seems to break tests on 32-bit
builds rather badly. Revert to just use "100 thousands must be big
enough" for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The t1092-sparse-checkout-compatibility.sh tests compare the stdout and
stderr for several Git commands across both full checkouts, sparse
checkouts with a full index, and sparse checkouts with a sparse index.
Since these are direct comparisons, sometimes a progress indicator can
flush at unpredictable points, especially on slower machines. This
causes the tests to be flaky.
One standard way to avoid this is to add GIT_PROGRESS_DELAY=0 to the Git
commands that are run, as this will force every progress indicator
created with start_progress_delay() to be created immediately. However,
there are some progress indicators that are created in the case of a
full index that are not created with a sparse index. Moreover, their
values may be different as those indexes have a different number of
entries.
Instead, use GIT_PROGRESS_DELAY=-1 (which will turn into UINT_MAX)
to ensure that any reasonable machine running these tests would
never display delayed progress indicators.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to read the init.templateDir setting at builtin/init-db.c using
a git_config() callback that, in turn, called git_config_pathname(). To
simplify the config reading logic at this file and plug a memory leak,
this was replaced by a direct call to git_config_get_value() at
e4de4502e6 ("init: remove git_init_db_config() while fixing leaks",
2021-03-14). However, this function doesn't provide path expanding
semantics, like git_config_pathname() does, so paths with '~/' and
'~user/' are treated literally. This makes 'git init' fail to handle
init.templateDir paths using these constructs:
$ git config init.templateDir '~/templates_dir'
$ git init
'warning: templates not found in ~/templates_dir'
Replace the git_config_get_value() call by git_config_get_pathname(),
which does the '~/' and '~user/' expansions. Also add a regression test.
Note that unlike git_config_get_value(), the config cache does not own
the memory for the path returned by git_config_get_pathname(), so we
must free() it.
Reported on IRC by rkta.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a regression with the "the editor exited uncleanly, aborting
everything" error message going missing after my
d21616c039 (git-send-email: refactor duplicate $? checks into a
function, 2021-04-06).
I introduced a $msg variable, but did not actually use it. This caused
us to miss the optional error message supplied by the "do_edit"
codepath. Fix that, and add tests to check that this works.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git for Windows is a native Windows program that works with native
absolute paths in the drive letter style C:\dir. The auxiliary
infrastructure is based on MSYS2, which uses POSIX style /C/dir.
When we test for output of absolute paths produced by git.exe, we
usally have to expect C:\dir style paths. To produce such expected
paths, we have to use $(pwd) in the test scripts; the alternative,
$PWD, produces a POSIX style path. ($PWD is a shell variable, and the
shell is bash, an MSYS2 program, and operates in the POSIX realm.)
There are two recently added tests that were written to expect C:\dir
paths. The output that is tested is produced by `git send-email`, but
behind the scenes, this is a Perl script, which also works in the
POSIX realm and produces /C/dir style output.
In the first test case that is changed here, replace $(pwd) by $PWD
so that the expected path is constructed using /C/dir style.
The second test case sets core.hooksPath to an absolute path. Since
the test script talks to native git.exe, it is supposed to place a
C:/dir style path into the configuration; therefore, keep $(pwd).
When this configuration value is consumed by the Perl script, it is
transformed to /C/dir style by the MSYS2 layer and echoed back in
this form in the error message. Hence, do use $PWD for the expected
value.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the 'uploadpack.blobPackfileUri' description in packfile-uri.txt
and the correct format also can be seen in t5702.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently GNU make already removes files when catching an interruption
signal, however, in order to deal with other kinds of errors a
workaround is in place to store target output to a temporary file, and
only move it to its right place on success.
By enabling the built-in .DELETE_ON_ERROR we let make do this task, so
we don't have to.
This way the rules can be simplified a lot.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits 50cff52f1a (When generating manpages, delete outdated targets
first., 2007-08-02) and f9286765b2 (Documentation/Makefile: remove
cmd-list.made before redirecting to it., 2007-08-06) created these rm
instances for a very rare corner-case: building as root by mistake.
It's odd to have workarounds here, but nowhere else in the Makefile--
which already fails in this stuation, starting from
Documentation/technical/.
We gain nothing but complexity, so let's remove them.
Comments-by: Jeff King <peff@peff.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
asciidoc needs asciidoc.conf, asciidoctor asciidoctor-extensions.rb.
Neither needs the other.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another brown paper bag inconsistency fix for a new feature
introduced during this cycle.
* dl/stash-show-untracked-fixup:
stash show: use stash.showIncludeUntracked even when diff options given
The "simple-ipc" did not compile without pthreads support, but the
build procedure was not properly account for it.
* jh/simple-ipc-sans-pthread:
simple-ipc: correct ifdefs when NO_PTHREADS is defined
The "rev-parse" command did not diagnose the lack of argument to
"--path-format" option, which was introduced in v2.31 era, which
has been corrected.
* wm/rev-parse-path-format-wo-arg:
rev-parse: fix segfault with missing --path-format argument
Despite that tar is available everywhere, it's not required by POSIX.
In our build system, users are allowed to specify which tar to be used
in Makefile knobs. Furthermore, GNU tar (gtar) is prefered when autotools
is being used.
In our testsuite, 7 out of 9 tar-required-tests use "$TAR", the other
two use "tar".
Let's change the remaining two tests to "$TAR".
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If options pertaining to how the diff is displayed is provided to
`git stash show`, the command will ignore the stash.showIncludeUntracked
configuration variable, defaulting to not showing any untracked files.
This is unintuitive behaviour since the format of the diff output and
whether or not to display untracked files are orthogonal.
Use stash.showIncludeUntracked even when diff options are given. Of
course, this is still overridable via the command-line options.
Update the documentation to explicitly say which configuration variables
will be overridden when a diff options are given.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix long standing inconsistency between -c/--cc that do imply -p on
one side, and -m that did not imply -p on the other side.
Change corresponding test accordingly, as "log -m" output should now
match one from "log -m -p", rather than from just "log".
Change documentation accordingly.
NOTES:
After this patch
git log -m
produces diffs without need to provide -p as well, that improves both
consistency and usability. It gets even more useful if one sets
"log.diffMerges" configuration variable to "first-parent" to force -m
produce usual diff with respect to first parent only.
This patch, however, does not change behavior when specific diff
format is explicitly provided on the command-line, so that commands
like
git log -m --raw
git log -m --stat
are not affected, nor does it change commands where specific diff
format is active by default, such as:
git diff-tree -m
It's also worth to be noticed that exact historical semantics of -m is
still provided by --diff-merges=separate.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring change in preparation for the next commit that
will let -m imply -p.
The old name doesn't match the intention to let not only -c/-cc imply
-p, but also -m, that is not a "combined" format, so we rename the
flag accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Passing "-m" in "git log --first-parent -m" is not needed as
--first-parent implies --diff-merges=first-parent anyway. OTOH, it
will stop being harmless once we let "-m" imply "-p".
While we are at it, fix corresponding test description in t3903-stash
to match what it actually tests.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev-list doesn't utilize -m. It happens to eat it silently, so this
bug went unnoticed.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move specific handling of "-m" for diff-index to diff-index.c, so
diff-merges is left to handle only diff for merges options.
Being a better design by itself, this is especially essential in
preparation for letting -m imply -p, as "diff-index -m" obviously
should not imply -p, as it's entirely unrelated.
To handle this, in addition to moving specific diff-index "-m" code
out of diff-merges, we introduce new
diff_merges_suppress_options_parsing()
and call it before generic options processing in cmd_diff_index().
This new diff_merges_suppress_options_parsing() could then be reused
and called before invocations of setup_revisions() for other commands
that don't need --diff-merges options, but that's outside of the scope
of these patch series.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-m in "git diff-index" means "match missing", that differs
from its meaning in "git diff". Let's check it in diff-index.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to ensure we don't affect plumbing commands with our changes
of "-m" semantics, so add corresponding test.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to ensure we won't break different diff formats when we start
to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to ensure we won't break different diff formats when we start
to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to notice current behavior that we are going to change when we
start to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Classic string concatenation while forgetting a space character.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor tr2_dst_try_uds_connect() to avoid a gcc warning[1] that
appears under -O3 (but not -O2). This makes the build pass under
DEVELOPER=1 without needing a DEVOPTS=no-error.
This can be reproduced with GCC Debian 8.3.0-6, but not e.g. with
clang 7.0.1-8+deb10u2. We've had this warning since
ee4512ed48 (trace2: create new combined trace facility, 2019-02-22).
As noted in [2] this warning happens because the compiler doesn't
assume that errno must be non-zero after a failed syscall.
Let's work around by using the well-established "saved_errno" pattern,
along with returning -1 ourselves instead of "errno". The caller can
thus rely on our "errno" on failure.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846 for a related
bug report against GCC.
1.
trace2/tr2_dst.c: In function ‘tr2_dst_get_trace_fd.part.5’:
trace2/tr2_dst.c:296:10: warning: ‘fd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dst->fd = fd;
~~~~~~~~^~~~
trace2/tr2_dst.c:229:6: note: ‘fd’ was declared here
int fd;
^~
2. https://lore.kernel.org/git/20200404142131.GA679473@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regression fix for a change made during this cycle.
* cs/http-use-basic-after-failed-negotiate:
Revert "remote-curl: fall back to basic auth if Negotiate fails"
t5551: test http interaction with credential helpers
When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted. To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on. See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused". This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s
mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s
just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms
That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't. If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:
Before Basename Series After Just This Series
no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s
mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s
Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.
The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.
However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result. If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization. Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed. Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As documented in Documentation/technical/remembering-renames.txt, and as
tested for in the two testcases in t6429 with "rename same file
identically" in their description, there is one case where we need to
have renames in one commit NOT be cached for the next commit in our
rebase sequence -- namely, rename/rename(1to1) cases. Rather than
specifically trying to uncache those and fix up dir_rename_counts() to
match (which would also be valid but more work), we simply disable the
optimization when this really rare type of rename occurs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a usable rename cache, then we can remove from
relevant_sources all the paths that were cached;
diffcore_rename_extended() can then consider an even smaller set of
relevant_sources in its rename detection.
However, when diffcore_rename_extended() is done, we will need to take
the renames it detected and then add back in all the ones we had cached
from before.
Add helper functions for doing these two operations; the next commit
will make use of them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commits created an in-memory cache of the results of rename
detection, and added logic to detect when that cache could appropriately
be used in a subsequent merge operation -- but we were still
unconditionally clearing the cache with each new merge operation anyway.
If it is valid to reuse the cache from one of the two sides of history,
preserve that side.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, callers of the merge-ort API could have passed an
uninitialized value for struct merge_result *result. However, we want
to check result to see if it has cached renames from a previous merge
that we can reuse; such values would be found behind result->priv.
However, if result->priv is uninitialized, attempting to access behind
it will give a segfault. So, we need result->priv to be NULL (which
will be the case if the caller does a memset(&result, 0)), or be written
by a previous call to the merge-ort machinery. Documenting this
requirement may help, but despite being the person who introduced this
requirement, I still missed it once and it did not fail in a very clear
way and led to a long debugging session.
Add a _properly_initialized field to merge_result; that value will be
0 if the caller zero'ed the merge_result, it will be set to a very
specific value by a previous run by the merge-ort machinery, and if it's
uninitialized it will most likely either be 0 or some value that does
not match the specific one we'd expect allowing us to throw a much more
meaningful error.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to know when renames detected in a previous merge operation can
be reused in a later merge operation. Consider the following setup
(from the git-rebase manpage):
A---B---C topic
/
D---E---F---G master
After rebasing, this will appear as:
A'--B'--C' topic
/
D---E---F---G master
Further, let's say that 'oldfile' was renamed to 'newfile' between E
and G. The rebase or cherry-pick of A onto G will involve a three-way
merge between E (as the merge base) and G and A. After detecting the
rename between E:oldfile and G:newfile, there will be a three-way
content merge of the following:
E:oldfile
G:newfile
A:oldfile
and produce a new result:
A':newfile
Now, when we want to pick B onto A', we will need to do a three-way
merge between A (as the merge-base) and A' and B. This will involve
a three-way content merge of
A:oldfile
A':newfile
B:oldfile
but only if we can detect that A:oldfile is similar enough to A':newfile
to be used together in a three-way content merge, i.e. only if we can
detect that A:oldfile and A':newfile are a rename. But we already know
that A:oldfile and A':newfile are similar enough to be used in a
three-way content merge, because that is precisely where A':newfile came
from in the previous merge.
Note that A & A' both appear in both merges. That gives us the
condition under which we can reuse renames.
There are a couple important points about this optimization:
- If the rebase or cherry-pick halts for user conflicts, these caches
are NOT saved anywhere. Thus, resuming a halted rebase or
cherry-pick will result in no reused renames for the next commit.
This is intentional, as user resolution can change files
significantly and in ways that violate the similarity assumptions
here.
- Technically, in a *very* narrow case this might give slightly
different results for rename detection. Using the example above,
if:
* E:oldfile had 20 lines
* G:newfile added 10 new lines at the beginning of the file
* A:oldfile deleted all but the first three lines of the file
then
=> A':newfile would have 13 lines, 3 of which matches those
in A:oldfile.
Consider the two cases:
* Without this optimization:
- the next step of the rebase operation (moving B to B')
would not detect the rename betwen A:oldfile and A':newfile
- we'd thus get a modify/delete conflict with the rebase
operation halting for the user to resolve, and have both
A':newfile and B:oldfile sitting in the working tree.
* With this optimization:
- the rename between A:oldfile and A':newfile would be detected
via the cache of renames
- a three-way merge between A:oldfile, A':newfile, and B:oldfile
would commence and be written to A':newfile
Now, is the difference in behavior a bug...or a bugfix? I can't
tell. Given that A:oldfile and A':newfile are not very similar,
when we three-way merge with B:oldfile it seems likely we'll hit a
conflict for the user to resolve. And it shouldn't be too hard for
users to see why we did that three-way merge; oldfile and newfile
*were* renames somewhere in the sequence. So, most of these corner
cases will still behave similarly -- namely, a conflict given to the
user to resolve. Also, consider the interesting case when commit B
is a clean revert of commit A. Without this optimization, a rebase
could not both apply a weird patch like A and then immediately
revert it; users would be forced to resolve merge conflicts. With
this optimization, it would successfully apply the clean revert.
So, there is certainly at least one case that behaves better. Even
if it's considered a "difference in behavior", I think both behaviors
are reasonable, and the time savings provided by this optimization
justify using the slightly altered rename heuristics.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fill in cache_pairs, cached_target_names, and cached_irrelevant based on
rename detection results. Future commits will make use of these values.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there are many renames between the old base of a series of commits
and the new base for a series of commits, the sequence of merges
employed to transplant those commits (from a cherry-pick or rebase
operation) will repeatedly detect the exact same renames. This is
wasted effort.
Add data structures which will be used to cache rename detection
results, along with the initialization and deallocation of these data
structures. Future commits will populate these caches, detect the
appropriate circumstances when they can be used, and employ them to
avoid re-detecting the same renames repeatedly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will soon be adding an optimization that caches (in memory only,
never written to disk) upstream renames during a sequence of merges such
as occurs during a cherry-pick or rebase operation. Add several tests
meant to stress such an implementation to ensure it does the right
thing, and include a test whose outcome we will later change due to this
optimization as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, when fast-rebase hit a conflict, it simply aborted and left
HEAD, the index, and the working tree where they were before the
operation started. While fast-rebase does not support restarting from a
conflicted state, write the conflicted state out anyway as it gives us a
way to see what the conflicts are and write tests that check for them.
This will be important in the upcoming commits, because sequencer.c is
only superficially integrated with merge-ort.c; in particular, it calls
merge_switch_to_result() after EACH merge instead of only calling it at
the end of all the sequence of merges (or when a conflict is hit). This
not only causes needless updates to the working copy and index, but also
causes all intermediate data to be freed and tossed, preventing caching
information from one merge to the next. However, integrating
sequencer.c more deeply with merge-ort.c is a big task, and making this
small extension to fast-rebase.c provides us with a simple way to test
the edge and corner cases that we want to make sure continue working.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
assert() can succinctly document expectations for the code, and do so in
a way that may be useful to future folks trying to refactor the code and
change basic assumptions; it allows them to more quickly find some
places where their violations of previous assumptions trips things up.
Unfortunately, assert() can surround a function call with important
side-effects, which is a huge mistake since some users will compile with
assertions disabled. I've had to debug such mistakes before in other
codebases, so I should know better. Luckily, this was only in test
code, but it's still very embarrassing. Change an assert() to an if
(...) BUG (...).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remembering renames on the upstream side of history in an early merge of
a rebase or cherry-pick for re-use in a latter merge of the same
operation makes pretty good intuitive sense. However, trying to show
that it doesn't cause some subtle behavioral difference or some funny
edge or corner case is much more involved. And, in fact, it does
introduce a subtle behavioral change.
Document all the assumptions, special cases, and logic involved in such
an optimization, and describe why this optimization is safe under the
current optimizations/features/etc. -- even when the subtle behavioral
change is triggered.
Part of the point of adding this document that goes over the
optimization in such laborious detail, is that it is possible that
significant future changes (optimizations or feature changes) could
interact with this optimization in interesting ways; this document is
here to help folks making big changes sanity check that the assumptions
and arguments underlying this optimization are still valid. (As a side
note, creating this document forced me to review things in sufficient
detail that I found I was not properly caching directory-rename-induced
renames, resulting in the code not being aware of those renames and
causing unnecessary diffcore_rename_extended() calls in subsequent
merges.)
A subsequent commit will add several testcases based on this document
meant to stress-test the implementation and also document the case with
the subtle behavioral change.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation for color.pager is technically correct, but
slightly misleading and doesn't really clarify the purpose of the
variable. As explained in the original thread which added it:
https://lore.kernel.org/git/E1G6zPH-00062L-Je@moooo.ath.cx/
the point is to deal with pagers that don't understand colors. And hence it
being set to "true" is necessary for colorizing output to the pager, but
not sufficient by itself (you must also have enabled one of the other
color options, though note that these are set to "auto" by default these
days).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Find typos in "po/TEAMS" file using the "git-po-helper" program. These
typos were introduced from commit v2.24.0-1-g9917eca794 (l10n: zh_TW:
add translation for v2.24.0, 2019-11-20 19:14:22 +0800).
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
The "chainlint" feature in the test framework is a handy way to
catch common mistakes in writing new tests, but tends to get
expensive. An knob to selectively disable it has been introduced
to help running tests that the developer has not modified.
* jk/test-chainlint-softer:
t: avoid sed-based chain-linting in some expensive cases
The handling of "%(push)" formatting element of "for-each-ref" and
friends was broken when the same codepath started handling
"%(push:<what>)", which has been corrected.
* zh/ref-filter-push-remote-fix:
ref-filter: fix read invalid union member bug
"git clone" from SHA256 repository by Git built with SHA-1 as the
default hash algorithm over the dumb HTTP protocol did not
correctly set up the resulting repository, which has been corrected.
* ew/sha256-clone-remote-curl-fix:
remote-curl: fix clone on sha256 repos
"git clean" and "git ls-files -i" had confusion around working on
or showing ignored paths inside an ignored directory, which has
been corrected.
* en/dir-traversal:
dir: introduce readdir_skip_dot_and_dotdot() helper
dir: update stale description of treat_directory()
dir: traverse into untracked directories if they may have ignored subfiles
dir: avoid unnecessary traversal into ignored directory
t3001, t7300: add testcase showcasing missed directory traversal
t7300: add testcase showing unnecessary traversal into ignored directory
ls-files: error out on -i unless -o or -c are specified
dir: report number of visited directories and paths with trace2
dir: convert trace calls to trace2 equivalents
Use -1 as error return value throughout.
This removes spurious differences in the GIT_TRACE_REFS output, depending on the
ref storage backend active.
Before, the cached ref_iterator (but only that iterator!) would return
peel_object() output directly. No callers relied on the peel_status values
beyond success/failure. All calls to these functions go through
peel_iterated_oid(), which returns peel_object() as a fallback, but also
squashing the error values.
The iteration interface already passes REF_ISSYMREF and REF_ISBROKEN through the
flags argument, so the additional error values in enum peel_status provide no
value.
The ref iteration interface provides a separate peel() function because certain
formats (eg. packed-refs and reftable) can store the peeled object next to the
tag SHA1. Passing the peeled SHA1 as an optional argument to each_ref_fn maps
more naturally to the implementation of ref databases. Changing the code in this
way is left for a future refactoring.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching with the v0 protocol over ssh (or a local upload-pack with
pipes), the server closes the connection as soon as it is finished
sending the pack. So even though the client may still be operating on
the data via index-pack (e.g., resolving deltas, checking connectivity,
etc), the server has released all resources.
With the v2 protocol, however, the server considers the ssh session only
as a transport, with individual requests coming over it. After sending
the pack, it goes back to its main loop, waiting for another request to
come from the client. As a result, the ssh session hangs around until
the client process ends, which may be much later (because resolving
deltas, etc, may consume a lot of CPU).
This is bad for two reasons:
- it's consuming resources on the server to leave open a connection
that won't see any more use
- if something bad happens to the ssh connection in the meantime (say,
it gets killed by the network because it's idle, as happened in a
real-world report), then ssh will exit non-zero, and we'll propagate
the error up the stack.
The server is correct here not to hang up after serving the pack. The v2
protocol's design is meant to allow multiple requests like this, and
hanging up would be the wrong thing for a hypothetical client which was
planning to make more requests (though in practice, the git.git client
never would, and I doubt any other implementations would either).
The right thing is instead for the client to signal to the server that
it's not interested in making more requests. We can do that by closing
the pipe descriptor we use to write to ssh. This will propagate to the
server upload-pack as an EOF when it tries to read the next request (and
then it will close its half, and the whole connection will go away).
It's important to do this "half duplex" shutdown, because we have to do
it _before_ we actually receive the pack. This is an artifact of the way
fetch-pack and index-pack (or unpack-objects) interact. We hand the
connection off to index-pack (really, a sideband demuxer which feeds
it), and then wait until it returns. And it doesn't do that until it has
resolved all of the deltas in the pack, even though it was done reading
from the server long before.
So just closing the connection fully after index-pack returns would be
too late; we'd have held it open much longer than was necessary. And
teaching index-pack to close the connection is awkward. It's not even
seeing the whole conversation (the sideband demuxer is, but it doesn't
actually know what's in the packets, or when the end comes).
Note that this close() is happening deep within the transport code. It's
possible that a caller would want to perform other operations over the
same ssh transport after receiving the pack. But as of the current code,
none of the callers do, and there haven't been discussions of any plans
to change this. If we need to support that later, we can probably do so
by passing down a flag for "you're the last request on the transport;
it's OK to close" instead of the code just assuming that's true.
The description above all discusses v2 ssh, so it's worth thinking about
how this interacts with other protocols:
- in v0 protocols, we could do the same half-duplex shutdown (it just
goes into the v0 do_fetch_pack() instead). This does work, but since
it doesn't have the same persistence problem in the first place,
there's little reason to change it at this point.
- local fetches against git-upload-pack on the same machine will
behave the same as ssh (they are talking over two pipes, and see EOF
on their input pipe)
- fetches against git-daemon will run this same code, and close one of
the descriptors. In practice, this won't do anything, since there
our two descriptors are dups of each other, and not part of a
half-duplex pair. The right thing would probably be to call
shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't
face the same error-code problem (since it's just a TCP connection),
so it's really only an optimization problem. And git:// is not that
widely used these days, and has less impact on server resources than
an ssh termination.
- v2 http doesn't suffer from this problem in the first place, as our
pipes terminate at a local git-remote-https, which is passing data
along as individual requests via curl. Probably curl is keeping the
TCP/TLS connection open for more requests, and we might be able to
tell it manually "hey, we are done making requests now". But I think
that's much less important. It again doesn't suffer from the
error-code problem, and HTTP keepalive is pretty well understood
(importantly, the timeouts can be set low, because clients like curl
know how to reconnect for subsequent requests if necessary). So it's
probably not worth figuring out how to tell curl that we're done
(though if we do, this patch is probably the first step anyway;
fetch-pack closes the pipe back to remote-https, which would be the
signal that it should tell curl we're done).
The code is pretty straightforward. We close the pipe at the right
moment, and set it to -1 to mark it as invalid. I modified the later
cleanup code to avoid calling close(-1). That's not strictly necessary,
since close(-1) is a noop, but hopefully makes things a bit more obvious
to a reader.
I suspect that trying to call more transport functions after the close()
(e.g., calling transport_fetch_refs() again) would fail, as it's not
smart enough to realize we need to re-open the ssh connection. But
that's already true when v0 is in use. And no current callers want to do
that (and again, the solution is probably a flag in the transport code
to keep things open, which can be added later).
There's no test here, as the situation it covers is inherently racy (the
question is when upload-pack exits, compared to when index-pack finishes
resolving deltas and exits). The rather gross shell snippet below does
recreate the problematic situation; when run on a sufficiently-large
repository (git.git works fine), it kills an "idle" upload-pack while
the client is resolving deltas, leading to a failed clone.
(
git clone --no-local --progress . foo.git 2>&1
echo >&2 "clone exit code=$?"
) |
tr '\r' '\n' |
while read line
do
case "$done,$line" in
,Resolving*)
echo "hit resolving deltas; killing upload-pack"
killall -9 git-upload-pack
done=t
;;
esac
done
Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-clone started respecting errors from the transport subsystem in
aab179d937 (builtin/clone.c: don't ignore transport_fetch_refs() errors,
2020-12-03). However, that commit didn't handle the cleanup of the
filesystem quite right.
The cleanup of the directory that cmd_clone() creates is done by an
atexit() handler, which we control with a flag. It starts as
JUNK_LEAVE_NONE ("clean up everything"), then progresses to
JUNK_LEAVE_REPO when we know we have a valid repo but not working tree,
and then finally JUNK_LEAVE_ALL when we have a successful checkout.
Most errors cause us to die(), which then triggers the handler to do the
right thing based on how far into cmd_clone() we got. But the checks
added by aab179d937 instead set the "err" variable and then jump to a
new "cleanup" label, which then returns our non-zero status. However,
the code after the cleanup label includes setting the flag to
JUNK_LEAVE_ALL, and so we accidentally leave the repository and working
tree in place.
One obvious option to fix this is to reorder the end of the function to
set the flag first, before cleanup code, and put the label between them.
But we can observe another small bug: the error return from
transport_fetch_refs() is generally "-1", and we propagate that to the
return value of cmd_clone(), which ultimately becomes the exit code of
the process. And we try to avoid transmitting negative values via exit
codes (only the low 8 bits are passed along as an unsigned value, though
in practice for "-1" this at least retains the property that it's
non-zero).
Instead, let's just die(). That makes us consistent with rest of the
code in the function. It does add a new "fatal:" line to the output, but
I'd argue that's a good thing:
- in the rare case that the transport code didn't say anything, now
the user gets _some_ error message
- even if the transport code said something like "error: ssh died of
signal 9", it's nice to also say "fatal" to indicate that we
considered that to be a show-stopper.
Triggering this in the test suite turns out to be surprisingly
difficult. Almost every error we'd encounter, including ones deep inside
the transport code, cause us to just die() right there! However, one way
is to put a fake wrapper around git-upload-pack that sends the complete
packfile but exits with a failure code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The text was somewhat confusing between the revision itself and the author.
Signed-off-by: Reuven Yagel <robi@post.jce.ac.il>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These methods were marked as MAYBE_UNUSED in the previous change to
avoid a complicated diff. Delete them entirely, since we now use the
hashfile API instead of this custom hashing code.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The do_write_index() method in read-cache.c has its own hashing logic
and buffering mechanism. Specifically, the ce_write() method was
introduced by 4990aadc (Speed up index file writing by chunking it
nicely, 2005-04-20) and similar mechanisms were introduced a few months
later in c38138cd (git-pack-objects: write the pack files with a SHA1
csum, 2005-06-26). Based on the timing, in the early days of the Git
codebase, I figured that these roughly equivalent code paths were never
unified only because it got lost in the shuffle. The hashfile API has
since been used extensively in other file formats, such as pack-indexes,
multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to
unify the index writing code to use the same mechanism.
I discovered this disparity while trying to create a new index format
that uses the chunk-format API. That API uses a hashfile as its base, so
it is incompatible with the custom code in read-cache.c.
This rewrite is rather straightforward. It replaces all writes to the
temporary file with writes to the hashfile struct. This takes care of
many of the direct interactions with the_hash_algo.
There are still some git_hash_ctx uses remaining: the extension headers
are hashed for use in the End of Index Entries (EOIE) extension. This
use of the git_hash_ctx is left as-is. There are multiple reasons to not
use a hashfile here, including the fact that the data is not actually
writing to a file, just a hash computation. These hashes do not block
our adoption of the chunk-format API in a future change to the index, so
leave it as-is.
The internals of the algorithms are mostly identical. Previously, the
hashfile API used a smaller 8KB buffer instead of the 128KB buffer from
read-cache.c. The previous change already unified these sizes.
There is one subtle point: we do not pass the CSUM_FSYNC to the
finalize_hashfile() method, which differs from most consumers of the
hashfile API. The extra fsync() call indicated by this flag causes a
significant peformance degradation that is noticeable for quick
commands that write the index, such as "git add". Other consumers can
absorb this cost with their more complicated data structure
organization, and further writing structures such as pack-files and
commit-graphs is rarely in the critical path for common user
interactions.
Some static methods become orphaned in this diff, so I marked them as
MAYBE_UNUSED. The diff is much harder to read if they are deleted during
this change. Instead, they will be deleted in the following change.
In addition to the test suite passing, I computed indexes using the
previous binaries and the binaries compiled after this change, and found
the index data to be exactly equal. Finally, I did extensive performance
testing of "git update-index --force-write" on repos of various sizes,
including one with over 2 million paths at HEAD. These tests
demonstrated less than 1% difference in behavior. As expected, the
performance should be considered unchanged. The previous changes to
increase the hashfile buffer size from 8K to 128K ensured this change
would not create a peformance regression.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashfile API uses a hard-coded buffer size of 8KB and has ever since
it was introduced in c38138c (git-pack-objects: write the pack files
with a SHA1 csum, 2005-06-26). It performs a similar function to the
hashing buffers in read-cache.c, but that code was updated from 8KB to
128KB in f279894 (read-cache: make the index write buffer size 128K,
2021-02-18). The justification there was that do_write_index() improves
from 1.02s to 0.72s. Since our end goal is to have the index writing
code use the hashfile API, we need to unify this buffer size to avoid a
performance regression.
There is a buffer, 'check_buffer', that is used to verify the check_fd
file descriptor. When this buffer increases to 128K to fit the data
being flushed, it causes the stack to overflow the limits placed in the
test suite. To avoid issues with stack size, move both 'buffer' and
'check_buffer' to be heap pointers within 'struct hashfile'. The
'check_buffer' member is left as NULL unless check_fd is set in
hashfd_check(). Both buffers are cleared as part of finalize_hashfile()
which also frees the full structure.
Since these buffers are now on the heap, we can adjust their size based
on the needs of the consumer. In particular, callers to
hashfd_throughput() are expecting to report progress indicators as the
buffer flushes. These callers would prefer the smaller 8k buffer to
avoid large delays between updates, especially for users with slower
networks. When the progress indicator is not used, the larger buffer is
preferrable.
By adding a new trace2 region in the chunk-format API, we can see that
the writing portion of 'git multi-pack-index write' lowers from ~1.49s
to ~1.47s on a Linux machine. These effects may be more pronounced or
diminished on other filesystems. The end-to-end timing is too noisy to
have a definitive change either way.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xsize_t helper aims to safely convert an off_t to a size_t,
erroring out when a file offset is too large to fit into a memory
address. It does this by using two casts:
size_t size = (size_t) len;
if (len != (off_t) size)
... error out ...
On a platform with sizeof(size_t) < sizeof(off_t), this check is safe
and correct. The first cast truncates to a size_t by finding the
remainder modulo SIZE_MAX+1 (see C99 section 6.3.1.3 Signed and
unsigned integers) and the second promotes to an off_t, meaning the
result is true if and only if len is representable as a size_t.
On other platforms, this two-casts strategy still works well (always
succeeds) for len >= 0. But for len < 0, when the first cast succeeds
and produces SIZE_MAX + 1 + len, the resulting value is too large to
be represented as an off_t, so the second cast produces implementation
defined behavior. In practice, it is likely to produce a result of
true despite len not being representable as size_t.
Simplify by replacing with a more straightforward check: compare len
to the relevant bounds and then cast it. (To avoid a -Wsign-compare
warning, after checking that len >= 0, we explicitly convert to a
sufficiently-large unsigned type before comparing to SIZE_MAX.)
In practice, this is not likely to come up since typical callers use
nonnegative len. Still, it's helpful to handle this case to make the
behavior easy to reason about.
Historical note: the original bounds-checking in 46be82dfd0 (xsize_t:
check whether we lose bits, 2010-07-28) did not produce this
implementation-defined behavior, though it still did not handle
negative offsets. It was not until 73560c793a (git-compat-util.h:
xsize_t() - avoid -Wsign-compare warnings, 2017-09-21) introduced the
double cast that the implementation-defined behavior was triggered.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 1b0d9545bb.
That commit does fix the situation it intended to (avoiding Negotiate
even when the credentials were provided in the URL), but it creates a
more serious regression: we now never hit the conditional for "we had a
username and password, tried them, but the server still gave us a 401".
That has two bad effects:
1. we never call credential_reject(), and thus a bogus credential
stored by a helper will live on forever
2. we never return HTTP_NOAUTH, so the error message the user gets is
"The requested URL returned error: 401", instead of "Authentication
failed".
Doing this correctly seems non-trivial, as we don't know whether the
Negotiate auth was a problem. Since this is a regression in the upcoming
v2.23.0 release (for which we're in -rc0), let's revert for now and work
on a fix separately.
(Note that this isn't a pure revert; the previous commit added a test
showing the regression, so we can now flip it to expect_success).
Reported-by: Ben Humphreys <behumphreys@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We test authentication with http, and we independently test that
credential helpers work, but we don't have any tests that cover the
two features working together. Let's add two:
1. Make sure that a successful request asks the helper to save the
credential. This works as expected.
2. Make sure that a failed request asks the helper to forget the
credential. This is marked as expect_failure, as it was recently
regressed by 1b0d9545bb (remote-curl: fall back to basic auth if
Negotiate fails, 2021-03-22). The symptom here is that the second
request should prompt the user, but doesn't.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes new people are confused by how a revision "range" works,
in that it is not a random collection of commits but a set of
commits that are all connected to each other, and most Git commands
work on a single such "range".
Give an example to clarify it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The flush() logic in csum-file.c was introduced originally by c38138c
(git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26)
and a portion of the logic performs similar utility to write_in_full()
in wrapper.c. The history of write_in_full() is full of moves and
renames, but was originally introduced by 7230e6d (Add write_or_die(), a
helper function, 2006-08-21).
The point of these sections of code are to flush a write buffer using
xwrite() and report errors in the case of disk space issues or other
generic input/output errors. The logic in flush() can interpret the
output of write_in_full() to provide the correct error messages to
users.
The logic in the hashfile API has an additional set of logic to augment
the progress indicator between calls to xwrite(). This was introduced by
2a128d6 (add throughput display to git-push, 2007-10-30). It seems that
since the hashfile's buffer is only 8KB, these additional progress
indicators might not be incredibly necessary. Instead, update the
progress only when write_in_full() complete.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While testing the sparse-index, I verified a test with --valgrind and it
complained about an uninitialized value being used in a jump in the
path_matches_pattern_list() method. The line was this one:
if (*dtype == DT_UNKNOWN)
In the call stack, the culprit was the initialization of the dtype
variable in convert_to_sparse_rec().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An object_id storing a SHA-1 name has some unused bytes at the end of
the hash array. Since these bytes are not used, they are usually not
initialized to any value either. However, at
parallel_checkout.c:send_one_item() the object_id of a cache entry is
copied into a buffer which is later sent to a checkout worker through a
pipe write(). This makes Valgrind complain about passing uninitialized
bytes to a syscall. The worker won't use these uninitialized bytes
either, but the warning could confuse someone trying to debug this code;
So instead of using oidcpy(), send_one_item() uses hashcpy() to only
copy the used/initialized bytes of the object_id, and leave the
remaining part with zeros.
However, since cf0983213c ("hash: add an algo member to struct
object_id", 2021-04-26), using hashcpy() is no longer sufficient here as
it won't copy the new algo field from the object_id. Let's add and use a
new function which meets both our requirements of copying all the
important object_id data while still avoiding the uninitialized bytes,
by padding the end of the hash array in the destination object_id. With
this change, we also no longer need the destination buffer from
send_one_item() to be initialized with zeros, so let's switch from
xcalloc() to xmalloc() to make this clear.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The C_LOCALE_OUTPUT prerequisite was removed in b1e079807b (tests:
remove last uses of C_LOCALE_OUTPUT, 2021-02-11), where Ævar noted:
I'm not leaving the prerequisite itself in place for in-flight changes
as there currently are none that introduce new tests that rely on it,
and because C_LOCALE_OUTPUT is currently a noop on the master branch
we likely won't have any new submissions that use it.
One more use of C_LOCALE_OUTPUT did creep in with 3d1bda6b5b (t7500: add
tests for --fixup=[amend|reword] options, 2021-03-15). This causes a
number of the tests to be skipped by default:
ok 35 # SKIP --fixup=reword: incompatible with --all (missing C_LOCALE_OUTPUT)
ok 36 # SKIP --fixup=reword: incompatible with --include (missing C_LOCALE_OUTPUT)
ok 37 # SKIP --fixup=reword: incompatible with --only (missing C_LOCALE_OUTPUT)
ok 38 # SKIP --fixup=reword: incompatible with --interactive (missing C_LOCALE_OUTPUT)
ok 39 # SKIP --fixup=reword: incompatible with --patch (missing C_LOCALE_OUTPUT)
Remove the C_LOCALE_OUTPUT prerequisite from these tests so they are
not skipped.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These error messages are intended for the user. Let's touch them up
since we're here from the previous commit.
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling "git rev-parse --path-format" without an argument segfaults
instead of giving an error message. Commit fac60b8925 (rev-parse: add
option for absolute or relative path formatting, 2020-12-13) added the
argument parsing code but forgot to handle NULL.
Returning an error makes sense here because there is no default value we
could use. Add a test case to verify.
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Eliminated 'Negation of emptiness' of 'nenhum' (not one/none)
* Eliminated 'Negation of emptiness' of 'nada' (nothing)
* Transformed 'Não' (No) into affirmative
* Some other translations
* Transforming 'não' (no) into affirmative
* From junção-de-3 to tri-junção
Signed-off-by: Daniel Santos <hello@brighterdan.com>
Clarify the default length used for the abbreviated form used for
commits in git describe.
The behavior was modified in Git 2.11.0, but the documentation was not
updated to clarify the new behavior.
Signed-off-by: Anders Höckersten <anders@hockersten.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sendemail.smtpServer configuration option and --smtp-server command
line option both support using a sendmail-like program to send emails by
specifying an absolute file path. However, this is not ideal for the
following reasons:
1. It overloads the meaning of smtpServer (now a program is being used
for the server?)
2. It doesn't allow for non-absolute paths, arguments, or arbitrary
scripting
Requiring an absolute path is bad for portability, as the same program
may be in different locations on different systems. If a user wishes to
pass arguments to their program, they have to use the smtpServerOption
option, which is cumbersome (as it must be repeated for each option) and
doesn't adhere to normal git conventions.
Introduce a new configuration option sendemail.sendmailCmd as well as a
command line option --sendmail-cmd that can be used to specify a command
(with or without arguments) or shell expression to run to send email.
The name of this option is consistent with --to-cmd and --cc-cmd. This
invocation honors the user's $PATH so that absolute paths are not
necessary. Arbitrary shell expressions are also supported, allowing
users to do basic scripting.
Give this option a higher precedence over --smtp-server and
sendemail.smtpServer, as the new interface is more flexible. For
backward compatibility, continue to support absolute paths in
--smtp-server and sendemail.smtpServer.
Signed-off-by: Gregory Anders <greg@gpanders.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to handle options recently added to "git stash show"
around untracked part of the stash segfaulted when these options
were used on a stash entry that does not record untracked part.
* dl/stash-show-untracked-fixup:
stash show: fix segfault with --{include,only}-untracked
t3905: correct test title
When "git update-ref -d" removes a ref that is packed, it left
empty directories under $GIT_DIR/refs/ for
* wc/packed-ref-removal-cleanup:
refs: cleanup directories when deleting packed ref
"git mailinfo" (hence "git am") learned the "--quoted-cr" option to
control how lines ending with CRLF wrapped in base64 or qp are
handled.
* dd/mailinfo-quoted-cr:
am: learn to process quoted lines that ends with CRLF
mailinfo: allow stripping quoted CR without warning
mailinfo: allow squelching quoted CRLF warning
mailinfo: warn if CRLF found in decoded base64/QP email
mailinfo: stop parsing options manually
mailinfo: load default metainfo_charset lazily
The final part of "parallel checkout".
* mt/parallel-checkout-part-3:
ci: run test round with parallel-checkout enabled
parallel-checkout: add tests related to .gitattributes
t0028: extract encoding helpers to lib-encoding.sh
parallel-checkout: add tests related to path collisions
parallel-checkout: add tests for basic operations
checkout-index: add parallel checkout support
builtin/checkout.c: complete parallel checkout support
make_transient_cache_entry(): optionally alloc from mem_pool
"git push" learns to discover common ancestor with the receiving
end over protocol v2.
* jt/push-negotiation:
send-pack: support push negotiation
fetch: teach independent negotiation (no packfile)
fetch-pack: refactor command and capability write
fetch-pack: refactor add_haves()
fetch-pack: refactor process_acks()
These strings have not been modified in any translation, nor should they
be.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add -i --dry-run" does not dry-run, which was surprising. The
combination of options has taught to error out.
* ow/no-dryrun-in-add-i:
add: die if both --dry-run and --interactive are given
"git p4" learned to find branch points more efficiently.
* jk/p4-locate-branch-point-optim:
git-p4: speed up search for branch parent
git-p4: ensure complex branches are cloned correctly
Over-the-wire protocol learns a new request type to ask for object
sizes given a list of object names.
* ba/object-info:
object-info: support for retrieving object info
The word-diff mode has been taught to work better with a word
regexp that can match an empty string.
* pw/word-diff-zero-width-matches:
word diff: handle zero length matches
In the original ref-filter design, it will copy the parsed
atom's name and attributes to `used_atom[i].name` in the
atom's parsing step, and use it again for string matching
in the later specific ref attributes filling step. It use
a lot of string matching to determine which atom we need.
Introduce the enum "atom_type", each enum value is named
as `ATOM_*`, which is the index of each corresponding
valid_atom entry. In the first step of the atom parsing,
`used_atom.atom_type` will record corresponding enum value
from valid_atom entry index, and then in specific reference
attribute filling step, only need to compare the value of
the `used_atom[i].atom_type` to check the atom type.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the support for "objectsize:disk" was bolted onto the
existing support for "objectsize", it didn't follow the
usual pattern for handling "atomtype:modifier", which reads
the <modifier> part just once while parsing the format
string, and store the parsed result in the union in the
used_atom structure, so that the string form of it does not
have to be parsed over and over at runtime (e.g. in
grab_common_values()).
Add a new member `objectsize` to the union `used_atom.u`,
so that we can separate the check of <modifier> from the
check of <atomtype>, this will bring scalability to atom
`%(objectsize)`.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 878f988350 (t/test-lib: teach --chain-lint to detect broken
&&-chains in subshells, 2018-07-11) introduced additional chain-lint
tests which add an extra "sed" pipeline to each test we run. This has a
measurable impact on runtime. Here are timings with and without a new
environment variable (added by this patch) that lets you disable just
the additional sed-based chain-lint tests:
Benchmark #1: GIT_TEST_CHAIN_LINT_HARDER=1 make test
Time (mean ± σ): 64.202 s ± 1.030 s [User: 622.469 s, System: 301.402 s]
Range (min … max): 61.571 s … 65.662 s 10 runs
Benchmark #2: GIT_TEST_CHAIN_LINT_HARDER=0 make test
Time (mean ± σ): 57.591 s ± 0.333 s [User: 529.368 s, System: 270.618 s]
Range (min … max): 57.143 s … 58.309 s 10 runs
Summary
'GIT_TEST_CHAIN_LINT_HARDER=0 make test' ran
1.11 ± 0.02 times faster than 'GIT_TEST_CHAIN_LINT_HARDER=1 make test'
Of course those extra lint checks are doing something useful, so paying
a few extra seconds (at least on Linux) isn't so bad (though note the
CPU time; we're bounded in our parallel run here by the slowest test, so
it really is ~120s of CPU improvement).
But we can observe that there are some test scripts where they produce a
much stronger effect, and provide less value. In t0027 and t3070 we run
a very large number of small tests, all driven by a series of
functions/loops which are filling in the test bodies. There we get much
less bang for our buck in terms of bug-finding versus CPU cost.
This patch introduces a mechanism for controlling when those extra
lint checks are run, at two levels:
- a user can ask to disable or to force-enable the checks by setting
GIT_TEST_CHAIN_LINT_HARDER
- if the user hasn't specified a preference, individual scripts can
disable the checks by setting GIT_TEST_CHAIN_LINT_HARDER_DEFAULT;
scripts which don't set that get the current behavior of enabling
them.
In addition, this patch flips the default for t0027 and t3070's
mass-generated sections to disable the extra checks. Here are the timing
results for t0027:
Benchmark #1: GIT_TEST_CHAIN_LINT_HARDER=1 ./t0027-auto-crlf.sh
Time (mean ± σ): 17.078 s ± 0.848 s [User: 14.878 s, System: 7.075 s]
Range (min … max): 15.952 s … 18.421 s 10 runs
Benchmark #2: GIT_TEST_CHAIN_LINT_HARDER=0 ./t0027-auto-crlf.sh
Time (mean ± σ): 9.063 s ± 0.759 s [User: 7.890 s, System: 3.362 s]
Range (min … max): 7.747 s … 10.619 s 10 runs
Benchmark #3: ./t0027-auto-crlf.sh
Time (mean ± σ): 9.186 s ± 0.881 s [User: 7.957 s, System: 3.427 s]
Range (min … max): 7.796 s … 10.498 s 10 runs
Summary
'GIT_TEST_CHAIN_LINT_HARDER=0 ./t0027-auto-crlf.sh' ran
1.01 ± 0.13 times faster than './t0027-auto-crlf.sh'
1.88 ± 0.18 times faster than 'GIT_TEST_CHAIN_LINT_HARDER=1 ./t0027-auto-crlf.sh'
We can see that disabling the checks for the whole script buys us an
almost 2x speedup. But the new default behavior, disabling them only for
the mass-generated part, gets us most of that speedup (but still leaves
the checks on for further manual tests people might write).
As a side note, I'd caution about comparing runtimes and CPU seconds
between this timing and the earlier "make test" one. In "make test",
we're running a lot of scripts in parallel, so the CPU is throttling
down (and thus a CPU second saved here would count for more during a
parallel run; the same work takes more CPU seconds there).
We get similar results for t3070:
Benchmark #1: GIT_TEST_CHAIN_LINT_HARDER=1 ./t3070-wildmatch.sh
Time (mean ± σ): 20.054 s ± 3.967 s [User: 16.003 s, System: 8.286 s]
Range (min … max): 11.891 s … 23.671 s 10 runs
Benchmark #2: GIT_TEST_CHAIN_LINT_HARDER=0 ./t3070-wildmatch.sh
Time (mean ± σ): 12.399 s ± 2.256 s [User: 7.542 s, System: 5.342 s]
Range (min … max): 9.606 s … 15.727 s 10 runs
Benchmark #3: ./t3070-wildmatch.sh
Time (mean ± σ): 10.726 s ± 3.476 s [User: 6.790 s, System: 4.365 s]
Range (min … max): 5.444 s … 15.376 s 10 runs
Summary
'./t3070-wildmatch.sh' ran
1.16 ± 0.43 times faster than 'GIT_TEST_CHAIN_LINT_HARDER=0 ./t3070-wildmatch.sh'
1.87 ± 0.71 times faster than 'GIT_TEST_CHAIN_LINT_HARDER=1 ./t3070-wildmatch.sh'
Again, we get almost a 2x speedup disabling these. In this case, there
are no tests not covered by the script's "default to disable" behavior,
so the second two benchmarks should be the same (and while they do
differ, you can see the variance is quite high but they're within one
standard deviation).
So it seems like for these two scripts, at least, disabling the extra
checks is a reasonable tradeoff. Sadly, the overall runtime of "make
test" on my system doesn't get much faster. But that's because we're
mostly limited by the cost of the single biggest test. Here are the
top-5 tests by wall-clock time from a parallel run, before my patch:
57.9192368984222 t9001-send-email.sh
45.6329638957977 t0027-auto-crlf.sh
32.5278220176697 t3070-wildmatch.sh
22.2701289653778 t7610-mergetool.sh
20.8635759353638 t1701-racy-split-index.sh
And after:
57.1476998329163 t9001-send-email.sh
33.776211977005 t0027-auto-crlf.sh
21.3116669654846 t7610-mergetool.sh
20.7748689651489 t1701-racy-split-index.sh
19.6957249641418 t7112-reset-submodule.sh
We dropped 12s from t0027, and t3070 dropped off our list entirely at
around 16s. In both cases we're bound by t9001, but its slowness is
due to the actual tests, so we'll have to deal with it in a different
way. But this reduces overall CPU, and means that dealing with t9001 (by
improving the speed of send-email or splitting it apart) will let us
reduce our overall runtime even on multi-core machines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit afda36dbf3 ("git-prompt: include sparsity state as well",
2020-06-21) added the use of some variables to control how to show
sparsity state in the git prompt, but implicitly assumed that undefined
variables would be treated as the empty string. This breaks users who
run under 'set -u'; fix the code to be more explicit.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git stash show --include-untracked` or
`git stash show --only-untracked` is run on a stash that doesn't include
an untracked entry, a segfault occurs. This happens because we do not
check whether the untracked entry is actually present and just attempt
to blindly dereference it.
Ensure that the untracked entry is present before actually attempting to
dereference it.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We reference the non-existent option `git stash show --show-untracked`
when we really meant `--only-untracked`. Correct the test title
accordingly.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places in the code were doing
while ((d = readdir(dir)) != NULL) {
if (is_dot_or_dotdot(d->d_name))
continue;
...process d...
}
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
...process d...
}
This helper particularly simplifies checks for empty directories.
Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms. (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.
Update the comments for treat_directory() to use these flag names rather
than the old member names.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored. We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about. The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing. Add it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways. Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.
For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded. Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded). The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong. It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse. Fix the logic
accordingly.
Some sidenotes about possible confusion with dir.c:
* "ignored" often refers to an untracked ignore", i.e. a file which is
not tracked which matches one of the ignore/exclusion rules. But you
can also have a "tracked ignore", a tracked file that happens to match
one of the ignore/exclusion rules and which dir.c has to worry about
since "git ls-files -c -i" is supposed to list them.
* The dir code often uses "ignored" and "excluded" interchangeably,
which you need to keep in mind while reading the code.
* "exclude" is used multiple ways in the code:
* As noted above, "exclude" is often a synonym for "ignored".
* The logic for parsing .gitignore files was re-used in
.git/info/sparse-checkout, except there it is used to mark paths that
the user wants to *keep*. This was mostly addressed by commit
65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
2019-09-03), but every once in a while you'll find a comment about
"exclude" referring to these patterns that might in fact be in use
by the sparse-checkout machinery for inclusion rules.
* The word "EXCLUDE" is also used for pathspec negation, as in
(pathspec->items[3].magic & PATHSPEC_EXCLUDE)
Thus if a user had a .gitignore file containing
*~
*.log
!settings.log
And then ran
git add -- 'settings.*' ':^settings.log'
Then :^settings.log is a pathspec negation making settings.log not
be requested to be added even though all other settings.* files are
being added. Also, !settings.log in the gitignore file is a negative
exclude pattern meaning that settings.log is normally a file we
want to track even though all other *.log files are ignored.
Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.
Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.
Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The PNPM package manager is apparently creating deeply nested (but
ignored) directory structures; traversing them is costly
performance-wise, unnecessary, and in some cases is even throwing
warnings/errors because the paths are too long to handle on various
platforms. Add a testcase that checks for such unnecessary directory
traversal.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ls-files --ignored can be used together with either --others or
--cached. After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
I had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.
While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well. In fact, of two uses in our
testsuite, I believe one of the two did make this error. In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.
-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break. Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide more statistics in trace2 output that include the number of
directories and total paths visited by the directory traversal logic.
Subsequent patches will take advantage of this to ensure we do not
unnecessarily traverse into ignored directories.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 07d90eadb5 (Makefile: add Perl runtime prefix support,
2018-04-10) PERL_DEFINES has been a simply-expanded variable, let's
make it recursively expanded instead.
This change doesn't matter for the correctness of the logic. Whether
we used simply-expanded or recursively expanded didn't change what we
wrote out in GIT-PERL-DEFINES, but being consistent with other rules
makes this easier to understand.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remote-https process needs to update it's own instance of
`the_repository' when it sees an HTTP(S) remote is using sha256.
Without this, parse_oid_hex() fails to handle sha256 OIDs when
it's eventually called by parse_fetch().
Tested with:
git clone https://yhbt.net/sha256test.git
GIT_SMART_HTTP=0 git clone https://yhbt.net/sha256test.git
(plain http:// also works)
Cloning the URL via git:// required no changes
Signed-off-by: Eric Wong <e@80x24.org>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
used_atom.u is an union, and it has different members depending on
what atom the auxiliary data the union part of the "struct
used_atom" wants to record. At most only one of the members can be
valid at any one time. Since the code checks u.remote_ref without
even making sure if the atom is "push" or "push:" (which are only
two cases that u.remote_ref.push becomes valid), but u.remote_ref
shares the same storage for other members of the union, the check
was reading from an invalid member, which was the bug.
Modify the condition here to check whether the atom name
equals to "push" or starts with "push:", to avoid reading the
value of invalid member of the union.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: further test fixes]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes two memory leaks when running `git maintenance start` or `git
maintenance stop` in `update_background_schedule`:
$ valgrind --leak-check=full ~/git/bin/git maintenance start
==76584== Memcheck, a memory error detector
==76584== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==76584== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==76584== Command: /home/lenaic/git/bin/git maintenance start
==76584==
==76584==
==76584== HEAP SUMMARY:
==76584== in use at exit: 34,880 bytes in 252 blocks
==76584== total heap usage: 820 allocs, 568 frees, 146,414 bytes allocated
==76584==
==76584== 65 bytes in 1 blocks are definitely lost in loss record 17 of 39
==76584== at 0x483E6AF: malloc (vg_replace_malloc.c:306)
==76584== by 0x3DC39C: xrealloc (wrapper.c:126)
==76584== by 0x3992CC: strbuf_grow (strbuf.c:98)
==76584== by 0x39A473: strbuf_vaddf (strbuf.c:392)
==76584== by 0x39BC54: xstrvfmt (strbuf.c:979)
==76584== by 0x39BD2C: xstrfmt (strbuf.c:989)
==76584== by 0x18451B: update_background_schedule (gc.c:1977)
==76584== by 0x1846F6: maintenance_start (gc.c:2011)
==76584== by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584== by 0x127A2E: run_builtin (git.c:453)
==76584== by 0x127E81: handle_builtin (git.c:704)
==76584== by 0x128142: run_argv (git.c:771)
==76584==
==76584== 240 bytes in 1 blocks are definitely lost in loss record 29 of 39
==76584== at 0x4840D7B: realloc (vg_replace_malloc.c:834)
==76584== by 0x491CE5D: getdelim (in /usr/lib/libc-2.33.so)
==76584== by 0x39ADD7: strbuf_getwholeline (strbuf.c:635)
==76584== by 0x39AF31: strbuf_getdelim (strbuf.c:706)
==76584== by 0x39B064: strbuf_getline_lf (strbuf.c:727)
==76584== by 0x184273: crontab_update_schedule (gc.c:1919)
==76584== by 0x184678: update_background_schedule (gc.c:1997)
==76584== by 0x1846F6: maintenance_start (gc.c:2011)
==76584== by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584== by 0x127A2E: run_builtin (git.c:453)
==76584== by 0x127E81: handle_builtin (git.c:704)
==76584== by 0x128142: run_argv (git.c:771)
==76584==
==76584== LEAK SUMMARY:
==76584== definitely lost: 305 bytes in 2 blocks
==76584== indirectly lost: 0 bytes in 0 blocks
==76584== possibly lost: 0 bytes in 0 blocks
==76584== still reachable: 34,575 bytes in 250 blocks
==76584== suppressed: 0 bytes in 0 blocks
==76584== Reachable blocks (those to which a pointer was found) are not shown.
==76584== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==76584==
==76584== For lists of detected and suppressed errors, rerun with: -s
==76584== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way the command line specified by the trailer.<token>.command
configuration variable receives the end-user supplied value was
both error prone and misleading. An alternative to achieve the
same goal in a safer and more intuitive way has been added, as
the trailer.<token>.cmd configuration variable, to replace it.
* zh/trailer-cmd:
trailer: add new .cmd config option
docs: correct descript of trailer.<token>.command
Various test and documentation updates about .gitsomething paths
that are symlinks.
* jk/symlinked-dotgitx-cleanup:
docs: document symlink restrictions for dot-files
fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
t0060: test ntfs/hfs-obscured dotfiles
t7450: test .gitmodules symlink matching against obscured names
t7450: test verify_path() handling of gitmodules
t7415: rename to expand scope
fsck_tree(): wrap some long lines
fsck_tree(): fix shadowed variable
t7415: remove out-dated comment about translation
Options to "git pack-objects" that take numeric values like
--window and --depth should not accept negative values; the input
validation has been tightened.
* jk/pack-objects-negative-options-fix:
pack-objects: clamp negative depth to 0
t5316: check behavior of pack-objects --depth=0
pack-objects: clamp negative window size to 0
t5300: check that we produced expected number of deltas
t5300: modernize basic tests
"git submodule update --quiet" did not propagate the quiet option
down to underlying "git fetch", which has been corrected.
* nc/submodule-update-quiet:
submodule update: silence underlying fetch with "--quiet"
A few variants of informational message "Already up-to-date" has
been rephrased.
* js/merge-already-up-to-date-message-reword:
merge: fix swapped "up to date" message components
merge(s): apply consistent punctuation to "up to date" messages
"git bisect skip" when custom words are used for new/old did not
work, which has been corrected.
* rj/bisect-skip-honor-terms:
bisect--helper: use BISECT_TERMS in 'bisect skip' command
When deleting a packed ref via 'update-ref -d', a lockfile is made in
the directory that would contain the loose copy of that ref, creating
any directories in the ref's path that do not exist. When the
transaction completes, the lockfile is deleted, but any empty parent
directories made when creating the lockfile are left in place. These
empty directories are not removed by 'pack-refs' or other housekeeping
tasks and will accumulate over time.
When deleting a loose ref, we remove all empty parent directories at the
end of the transaction.
This commit applies the parent directory cleanup logic used when
deleting loose refs to packed refs as well.
Signed-off-by: Will Chandler <wfc@wfchandler.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a subshell added in a preceding commit to instead use a new
"-C" option to "check_describe". The idiom for this is copied as-is
from the "test_commit" function in test-lib-functions.sh
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a nested invocation of "test_expect_success", the
"check_describe()" function is a wrapper for calling
test_expect_success, and therefore needs to be called outside the body
of another "test_expect_success".
The two tests added in 30b1c7ad9d (describe: don't abort too early
when searching tags, 2020-02-26) were not testing for anything due to
this logic error. Without this fix reverting the C code changes in
that commit still has all tests passing, with this fix we're actually
testing the "describe" output. This is because "test_expect_success"
calls "test_finish_", whose last statement happens to be true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the one test that relied on the "err.actual" file produced by
check_describe() to instead do its own check of "git describe"
output.
This means that the two tests won't have an inter-dependency (e.g. if
the earlier test is skipped).
An earlier version of this patch instead asserted that no other test
had any output on stderr. We're not doing that here out of fear that
"gc --auto" or another future change to "git describe" will cause it
to legitimately emit output on stderr unexpectedly[1].
I'd think that inverting the test added in 3291fe4072 (Add
git-describe test for "verify annotated tag names on output",
2008-03-03) to make checking that we don't have warnings the rule
rather than the exception would be the sort of thing the describe
tests should be catching, but for now let's leave it as it is.
1. http://lore.kernel.org/git/xmqqwnuqo8ze.fsf@gitster.c.googlers.com
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the glob matching via a "case" statement to a "test_cmp" after
we've stripped out the hash-specific g<hash-abbrev>
suffix. 5312ab11fb (Add describe test., 2007-01-13).
This means that we can use test_cmp to compare the output. I could
omit the "-8" change of e.g. "A-*" to "A-8-gHASH", but I think it
makes sense to test that here explicitly. It means you need to add new
tests to the bottom of the file, but that's not a burden in this case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve tests added in 9f67d2e827 (Teach "git describe" --dirty
option, 2009-10-21) and 2ed5c8e174 (describe: setup working tree for
--dirty, 2019-02-03) so that they make sense in combination with each
other.
The "check_describe" being removed here was the earlier test, we then
later added these --work-tree tests which really just wanted to check
if we got the exact same output from "describe", but the test wasn't
structured to test for that.
Let's change it to do that, which both improves test coverage and
makes it more obvious what's going on here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the dummy discard_hunk_line() function added in
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.
Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.
It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.
It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.
The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940 (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the code added in 611e42a598 (xdiff: provide a separate emit
callback for hunks, 2018-11-02) to be more readable by using
designated initializers.
This changes "priv" in rerere.c to be initialized to NULL as we did in
merge-tree.c. That's not needed as we'll only use it if the callback
is defined, but being consistent here is better and less verbose.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of special-casing creations and deletions let's just generate
a diff for them.
This logic of not running a diff under -G if we don't have both sides
dates back to the original implementation of -S in
52e9578985 ([PATCH] Introducing software archaeologist's tool
"pickaxe"., 2005-05-21).
In the case of -S we were not working with the xdiff interface and
needed to do this, but when -G was implemented in f506b8e8b5 (git
log/diff: add -G<regexp> that greps in the patch text, 2010-08-23)
this logic was diligently copied over.
But as the performance test added earlier in this series shows, this
does not make much of a difference. With:
time GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' ./run origin/next HEAD~ HEAD -- p4209-pickaxe.sh
With the HEAD~ commit being the preceding "pickaxe -G: terminate early
on matching lines" we get these results. Note that it's only the -G
codepaths that are relevant to this change:
Test origin/next HEAD~ HEAD
-----------------------------------------------------------------------------------------------------------------------------------------
4209.1: git log -S'int main' <limit-rev>.. 0.35(0.32+0.03) 0.35(0.33+0.02) +0.0% 0.35(0.30+0.05) +0.0%
4209.2: git log -S'æ' <limit-rev>.. 0.46(0.42+0.04) 0.46(0.41+0.05) +0.0% 0.46(0.42+0.04) +0.0%
4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.65(0.62+0.02) 0.64(0.61+0.02) -1.5% 0.64(0.60+0.04) -1.5%
4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.52(0.45+0.06) 0.52(0.50+0.01) +0.0% 0.54(0.47+0.04) +3.8%
4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.39(0.34+0.05) 0.39(0.34+0.04) +0.0% 0.39(0.36+0.03) +0.0%
4209.6: git log -G'(int|void|null)' <limit-rev>.. 0.60(0.55+0.04) 0.58(0.54+0.03) -3.3% 0.58(0.49+0.08) -3.3%
4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>.. 0.61(0.52+0.06) 0.59(0.53+0.05) -3.3% 0.59(0.54+0.05) -3.3%
4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.61(0.51+0.07) 0.58(0.54+0.04) -4.9% 0.57(0.51+0.06) -6.6%
4209.9: git log -i -S'int main' <limit-rev>.. 0.36(0.31+0.04) 0.36(0.34+0.02) +0.0% 0.35(0.32+0.03) -2.8%
4209.10: git log -i -S'æ' <limit-rev>.. 0.36(0.33+0.03) 0.39(0.34+0.01) +8.3% 0.36(0.32+0.03) +0.0%
4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.83(0.77+0.05) 0.82(0.77+0.05) -1.2% 0.80(0.75+0.04) -3.6%
4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.67(0.61+0.03) 0.64(0.61+0.03) -4.5% 0.63(0.61+0.02) -6.0%
4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.40(0.37+0.02) 0.40(0.37+0.03) +0.0% 0.40(0.36+0.04) +0.0%
4209.14: git log -i -G'(int|void|null)' <limit-rev>.. 0.58(0.51+0.07) 0.59(0.52+0.06) +1.7% 0.58(0.52+0.05) +0.0%
4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>.. 0.60(0.54+0.05) 0.60(0.54+0.06) +0.0% 0.60(0.56+0.03) +0.0%
4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.58(0.51+0.06) 0.57(0.52+0.05) -1.7% 0.60(0.48+0.09) +3.4%
This small simplification really doesn't buy us much now, but I've got
plans to both convert the pickaxe code to using a PCREv2 backend[1]
and to implement additional pickaxe modes to do custom searches
through the diff[2]. Always having the diff available under -G is
going to help to simplify both of those changes.
1. https://lore.kernel.org/git/20210203032811.14979-22-avarab@gmail.com/
2. https://lore.kernel.org/git/20190424152215.16251-3-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at
us. This makes use of a new "early return" xdiff interface added by
preceding commits.
The TODO item (or, the NEEDSWORK comment) has been there since "git
log -G" was implemented. See f506b8e8b5 (git log/diff: add -G<regexp>
that greps in the patch text, 2010-08-23).
But now with the support added in the preceding changes to the
xdiff-interface we can return early. Let's assert the behavior of that
new early-return xdiff-interface by having a BUG() call here to die if
it ever starts handing us needless work again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.
Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e056 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.
There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf (diff: use hunk callback for word-diff,
2018-11-02).
In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.
Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.
This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.
I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".
But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.
Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.
The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the function prototype of xdiff_emit_line_fn to return an "int"
instead of "void". Change all of those functions to "return 0",
nothing checks those return values yet, and no behavior is being
changed.
In subsequent commits the interface will be changed to allow early
return via this new return value.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the "log -S<pat>" switch counts occurrences of <pat> on the
pre-image and post-image of a change. As soon as we know we had e.g. 1
before and 2 now we can stop, we don't need to keep counting past 2.
With this change a diff between A and B may have different performance
characteristics than between B and A. That's OK in this case, since
we'll emit the same output, and the effect is to make one of them
better.
I'm picking a check of "one" first on the assumption that it's a more
common case to have files grow over time than not.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the {one,two}_contains variables to c{1,2}. This will make a
follow-up change easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
that we won't abort early on content that has NULs in it.
We've had a hard requirement on REG_STARTEND since 2f8952250a (regex:
add regexec_buf() that can work on a non NUL-terminated string,
2016-09-21), but this sanity check dates back to d01d8c6782 (Support
for pickaxe matching regular expressions, 2006-03-29).
It wasn't needed anymore, and as the now-passing test shows, actively
getting in our way. Since we always require REG_STARTEND support we do
not need to stop at NULs. If we are dealing with a haystack with NUL
in it. The needle may be behind that NUL.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Assert early in diffcore_pickaxe() that we've got a needle to work
with under -G and -S.
This code is redundant to the check -G and -S get from
parse-options.c's get_arg(), which I'm adding a test for.
This check dates back to e1b161161d (diffcore-pickaxe: fix infinite
loop on zero-length needle, 2007-01-25) when "git log -S" could send
this code into an infinite loop.
It was then later refactored in 8fa4b09fb1 (pickaxe: hoist empty
needle check, 2012-10-28) into its current form, but it seemingly
wasn't noticed that in the meantime a move to the parse-options.c API
in dea007fb4c (diff: parse separate options like -S foo, 2010-08-05)
had made it redundant.
Let's retain some of the paranoia here with a BUG(), but there's no
need to be checking this in the pickaxe_match() inner loop.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's hard to read this codepath at a glance and reason about exactly
what combination of -G and -S will compile either regexes or kwset,
and whether we'll then dispatch to "diff_grep" or "has_changes".
Then in the "--find-object" case we aren't using the callback
function, but were previously passing down "has_changes".
Refactor this code to exhaustively check "opts", it's now more obvious
what callback function (or none) we want under what mode.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for the -G and -S pickaxe options and related options.
This test supports being run with GIT_TEST_LONG=1 to adjust the limit
on the number of commits from 1k to 10k. The 1k limit seems to hit a
good spot on git.git
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor contains() to do its assignments at the same time that it
does its declarations.
This code could have been refactored in ef90ab66e8 (pickaxe: use
textconv for -S counting, 2012-10-28) when a function call between the
declarations and assignments was removed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the pickaxe and pickaxe_opts fields next to each other again. In
a past life they'd been on adjacent lines, but when they got moved
from a global variable to the diff_options struct in 6b5ee137e5 (Diff
clean-up., 2005-09-21) they got split apart.
That split made sense at the time, the "char*" and "int" (flags)
options were being grouped, but we've long since abandoned that
pattern in the diff_options struct, and now it makes more sense to
group these together again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).
But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.
When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.
Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently tolerated combining it with -G, let's die
instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a missing test for --no-pickaxe-regex. This has been an error ever
since before the -S or -G options were added, or since
7ae0b0cb65 (git-log (internal): more options., 2006-03-01).
The reason for adding this test is that Junio suggested in [1] in
response to a later test addition in this series that it might be good
to support --no-pickaxe-regex in combination with -G. This would allow
for fixed-string searching with -G, similr to grep's --fixed-strings
mode.
I agree that that would make sense if anyone would like to implement
it, but since it dies right now let's first add this test to assert
the existing long-standing behavior. We can always add support for
--[no-]pickaxe-regex in combination with -G at some later date.
1. http://lore.kernel.org/git/xmqqwnto9pt7.fsf@gitster.g
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for the options sanity check added in 5e505257f2 (diff:
properly error out when combining multiple pickaxe options,
2018-01-04).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No test in our test suite checked for "log -S<pat>" being a fixed
string, as opposed to "log -S<pat> --pickaxe-regex". Let's test for
it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In diffgrep_consume() we generate a diff, and then advance past the
"+" or "-" at the start of the line for matching. This has been done
ever since the code was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23).
If we match "line" instead of "line + 1" no tests fail, i.e. we've got
zero coverage for whether any of our searches match the beginning of
the line or not. Let's add a test for this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the existing tests added in e0e7cb8080 (log -G: ignore
binary files, 2018-12-14) to use the --append option I added in
3373518cc8 (test-lib functions: add an --append option to
test_commit, 2021-01-12) and the --printf option added as part of an
in-flight topic of mine this commit depends on.
While I'm at it change some of the setup of the test to use a more
sensible pattern, e.g. setting up a temporary repo instead of creating
an orphan branch.
Since the -G and -S options will behave the same way with truncated
and removed content also change the "git rm" to emptying data.bin,
that's just catering to how test_commit works. The resulting test is
shorter.
See also f5d79bf7dd (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The kwset optimization has not been used by grep since
48de2a768c (grep: remove the kwset optimization, 2019-07-01).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove various redundant or obsolete code from the test_create_repo()
function, and split up its use in test-lib.sh from what tests need
from it.
This leave us with a pass-through wrapper for "git init" in
test-lib-functions.sh, in test-lib.sh we have the same, except for
needing to redirect stdout/stderr, and emitting an error ourselves if
it fails. We don't need to error() ourselves when test_create_repo()
is invoked, as the invocation will be a part of a test's "&&"-chain.
Everything below this paragraph is a detailed summary of the history
of test_create_repo() explaining why it's safe to remove the various
things it was doing:
1. "mkdir -p" isn't needed because "git init" itself will create
leading directories if needed.
2. Since we're now a simple wrapper for "git init" we don't need to
check that we have only one argument. If someone wants to run
"test_create_repo --bare x" that's OK.
3. We won't ever hit that "Cannot setup test environment"
error.
Checking the test environment sanity when doing "git init" dates
back to eea420693b (t0000: catch trivial pilot errors.,
2005-12-10) and 2ccd2027b0 (trivial: check, if t/trash directory
was successfully created, 2006-01-05).
We can also see it in another form a bit later in my own
0d314ce834 (test-lib: use subshell instead of cd $new && .. && cd
$old, 2010-08-30).
But since 2006f0adae (t/test-lib: make sure Git has already been
built, 2012-09-17) we already check if we have a built git
earlier.
The one thing this was testing after that 2012 change was that
we'd just built "git", but not "git-init", but since
3af4c7156c (tests: respect GIT_TEST_INSTALLED when initializing
repositories, 2018-11-12) we invoke "git", not "git-init".
So all of that's been checked already, and we don't need to
re-check it here.
4. We don't need to move .git/hooks out of the way.
That dates back to c09a69a83e (Disable hooks during tests.,
2005-10-16), since then hooks became disabled by default in
f98f8cbac0 (Ship sample hooks with .sample suffix, 2008-06-24).
So the hooks were already disabled by default, but as can be seen
from "mkdir .git/hooks" changes various tests needed to re-setup
that directory. Now they no longer do.
This makes us implicitly depend on the default hooks being
disabled, which is a good thing. If and when we'd have any
on-by-default hooks (I see no reason we ever would) we'd want to
see the subtle and not so subtle ways that would break the test
suite.
5. We don't need to "cd" to the "$repo" directory at all anymore.
In the code being removed here we both "cd"'d to the repository
before calling "init", and did so in a subshell.
It's not important to do either, so both of those can be
removed. We cd'd because this code grew from test-lib.sh code
where we'd have done so already, see eedf8f97e5 (Abstract
test_create_repo out for use in tests., 2006-02-17), and later
"cd"'d inside a subshell since 0d314ce834 to avoid having to keep
track of an "old pwd" variable to cd back after the setup.
Being in the repository directory made moving the hooks around
easier (we wouldn't have to fully qualify the path). Since we're
not moving the hooks per #4 above we don't need to "cd" for that
reason either.
6. We can drop the --template argument and instead rely on the
GIT_TEMPLATE_DIR set to the same path earlier in test-lib.sh. See
8683a45d66 (Introduce GIT_TEMPLATE_DIR, 2006-12-19)
7. We only needed that ">&3 2>&4" redirection when invoked from
test-lib.sh.
We could still invoke test_create_repo() there, but as the
invocation is now trivial and we don't have a good reason to use
test_create_repo() elsewhere let's call "git init" there
ourselves.
8. We didn't need to resolve "git" as
"${GIT_TEST_INSTALLED:-$GIT_EXEC_PATH}/git$X" in test_create_repo(),
even for the use of test-lib.sh
PATH is already set up in test-lib.sh to start with
GIT_TEST_INSTALLED and/or GIT_EXEC_PATH before
test_create_repo() (now "git init") is called.. So we can simply
run "git" and rely on the PATH lookup choosing the right
executable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Arrange for the advice about naming the initial branch not to be shown
in the --verbose output of the test suite.
Since 675704c74d (init: provide useful advice about
init.defaultBranch, 2020-12-11) some tests have been very chatty with
repeated occurrences of this multi-line advice. Having it be this
verbose isn't helpful for anyone in the context of git's own test
suite, and it makes debugging tests that use their own "git init"
invocations needlessly distracting.
By setting the GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME variable early in
test-lib.sh itself we'll squash the warning not only for
test_create_repo(), as 675704c74d explicitly intended, but also for
other "git init" invocations.
And once we'd like to have this configuration set for all "git init"
invocations in the test suite we can get rid of the init.defaultBranch
configuration setting in test_create_repo(), as
repo_default_branch_name() in refs.c will take the GIT_TEST_* variable
over it being set.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reformat an argument list changed in 675704c74d (init: provide useful
advice about init.defaultBranch, 2020-12-11) to have the "-c" on the
same line as the argument it sets. This whitespace-only change makes
it easier to review a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a use of $GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME added in
704fed9ea2 (tests: start moving to a different default main branch
name, 2020-10-23) to simply discover the initial branch name of a
repository set up in this function with "symbolic-ref --short".
That's something done in another test in 704fed9ea2, so doing it like
this seems like an omission, or rather an overly eager
search/replacement instead of fixing the test logic.
There are only three uses of the GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
variable in the test suite, this gets rid of one of those.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a --printf option to test_commit to allow writing to the file with
"printf" instead of "echo".
This is useful for writing "\n", "\0" etc., in particular in
combination with the --append option added in 3373518cc8 (test-lib
functions: add an --append option to test_commit, 2021-01-12).
I'm converting a few tests to use the new option rather than a manual
printf/add/commit combination to demonstrate its usefulness. While I'm
at it use "test_create_repo" where appropriate, and give the
first/second commit a meaningful/more conventional log message in
cases where no test cared about that message.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the setup of the describe tests to use test_commit when
possible. This makes use of the new --annotate option to test_commit.
Some of the setup here could simply be removed since the data being
created wasn't important to any of the subsequent tests, so I've done
so. E.g. assigning to the "one" variable was always useless, and just
checking that we can describe HEAD after the first commit wasn't
useful.
In the case of the "two" variable we could instead use the tag we just
created. See 5312ab11fb (Add describe test., 2007-01-13) for the
initial version of this code. There's other cases here like redundant
"test_tick" invocations, or the simplification of not echoing "X" to a
file we're about to tag as "x", now we just use "x" in both cases.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an --annotated option to test_commit to create annotated tags. The
tag will share the same message as the commit, and we'll call
test_tick before creating it (unless --notick) is provided.
There's quite a few tests that could be simplified with this
construct. I've picked one to convert in this change as a
demonstration.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 76b8b8d05c (test-lib functions: document arguments to test_commit,
2021-01-12) I added missing documentation to test_commit, but in less
than a month later in 3803a3a099 (t: add --no-tag option to
test_commit, 2021-02-09) we got another undocumented option. Let's fix
that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reword the documentation for "test_commit --append" added in my
3373518cc8 (test-lib functions: add an --append option to test_commit,
2021-01-12).
A follow-up commit will make the "echo" part of this configurable, and
in any case saying "echo >>" rather than ">>" was redundant.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop setting the GIT_TEST_FRAMEWORK_SELFTEST variable. This was originally needed
back in 4231d1ba99 (t0000: do not get self-test disrupted by
environment warnings, 2018-09-20).
It hasn't been needed since I deleted the relevant code in test-lib.sh
in c0eedbc009 (test-lib: remove check_var_migration, 2021-02-09), I
just didn't notice that it was set here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no point in creating a repository or directory only to decide
right afterwards that we're skipping all the tests. We can save
ourselves the redundant "git init" or "mkdir" and "rm -rf" in this
case.
We carry around the "$remove_trash" variable because if the directory
is unexpectedly gone at test_done time we'll still want to hit the
"trash directory already removed" error, but not if we never created
the trash directory. See df4c0d1a79 (test-lib: abort when can't
remove trash directory, 2017-04-20) for the addition of that error.
So let's partially revert 06478dab4c (test-lib: retire $remove_trash
variable, 2017-04-23) and move the decision about whether to skip all
tests earlier.
Let's also fix a bug that was with us since abc5d372ec (Enable
parallel tests, 2008-08-08): we would leak $remove_trash from the
environment. We don't want this to error out, so let's reset it to the
empty string first:
remove_trash=t GIT_SKIP_TESTS=t0001 ./t0001-init.sh
I tested this with --debug, see 4d0912a206 (test-lib.sh: do not barf
under --debug at the end of the test, 2017-04-24) for a bug we don't
want to re-introduce.
While I'm at it, let's move the HOME assignment to just before
test_create_repo, it could be lower, but it seems better to set it
before calling anything in test-lib-functions.sh
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "renamed" has two possible translations in many European
languages depending on whether one thing was renamed or two things were
renamed. Give translators freedom to alter any part of the message to
make it sound right in their language.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack -A -d" in a partial clone unnecessarily loosened
objects in promisor pack.
* rs/repack-without-loosening-promised-objects:
repack: avoid loosening promisor objects in partial clones
"git subtree" updates.
* ls/subtree: (30 commits)
subtree: be stricter about validating flags
subtree: push: allow specifying a local rev other than HEAD
subtree: allow 'split' flags to be passed to 'push'
subtree: allow --squash to be used with --rejoin
subtree: give the docs a once-over
subtree: have $indent actually affect indentation
subtree: don't let debug and progress output clash
subtree: add comments and sanity checks
subtree: remove duplicate check
subtree: parse revs in individual cmd_ functions
subtree: use "^{commit}" instead of "^0"
subtree: don't fuss with PATH
subtree: use "$*" instead of "$@" as appropriate
subtree: use more explicit variable names for cmdline args
subtree: use git-sh-setup's `say`
subtree: use `git merge-base --is-ancestor`
subtree: drop support for git < 1.7
subtree: more consistent error propagation
subtree: don't have loose code outside of a function
subtree: t7900: add porcelain tests for 'pull' and 'push'
...
SHA-256 transition.
* bc/hash-transition-interop-part-1:
hex: print objects using the hash algorithm member
hex: default to the_hash_algo on zero algorithm value
builtin/pack-objects: avoid using struct object_id for pack hash
commit-graph: don't store file hashes as struct object_id
builtin/show-index: set the algorithm for object IDs
hash: provide per-algorithm null OIDs
hash: set, copy, and use algo field in struct object_id
builtin/pack-redundant: avoid casting buffers to struct object_id
Use the final_oid_fn to finalize hashing of object IDs
hash: add a function to finalize object IDs
http-push: set algorithm when reading object ID
Always use oidread to read into struct object_id
hash: add an algo member to struct object_id
In previous changes, mailinfo has learnt to process lines that decoded
from base64 or quoted-printable, and ends with CRLF.
Let's teach "am" that new trick, too.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous changes, we've turned on warning for quoted CR in base64 or
quoted-printable email messages. Some projects see those quoted CR a lot,
they know that it happens most of the time, and they find it's desirable
to always strip those CR.
Those projects in question usually fall back to use other tools to handle
patches when receive such patches.
Let's help those projects handle those patches by stripping those
excessive CR.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous change, Git starts to warn for quoted CRLF in decoded
base64/QP email. Despite those warnings are usually helpful,
quoted CRLF could be part of some users' workflow.
Let's give them an option to turn off the warning completely.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When SMTP servers receive 8-bit email messages, possibly with only
LF as line ending, some of them decide to change said LF to CRLF.
Some mailing list softwares, when receive 8-bit email messages,
decide to encode those messages in base64 or quoted-printable.
If an email is transfered through above mail servers, then distributed
by such mailing list softwares, the recipients will receive an email
contains a patch mungled with CRLF encoded inside another encoding.
Thus, such CR (in CRLF) couldn't be dropped by "mailsplit".
Hence, the mailed patch couldn't be applied cleanly.
Such accidents have been observed in the wild [1].
Instead of silently rejecting those messages, let's give our users
some warnings if such CR (as part of CRLF) is found.
[1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of "%ch" is missing a space after "human style", before
the parenthetical remark. This description was introduced in b722d4560e
("pretty: provide human date format", 2021-04-25). That commit also
added "%ah", which does have the space already.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop the ")" at the end of this paragraph. There's a parenthetical
remark in this paragraph, but it's been closed on the line above.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Portability fix for command line completion script (in contrib/).
* si/zsh-complete-comment-fix:
work around zsh comment in __git_complete_worktree_paths
Further update the command line completion (in contrib/) for "git
stash".
* dl/complete-stash-updates:
git-completion.bash: consolidate cases in _git_stash()
git-completion.bash: use $__git_cmd_idx in more places
git-completion.bash: rename to $__git_cmd_idx
git-completion.bash: separate some commands onto their own line
The command line completion (in contrib/) for "git stash" has been
updated.
* dl/complete-stash:
git-completion.bash: use __gitcomp_builtin() in _git_stash()
git-completion.bash: extract from else in _git_stash()
git-completion.bash: pass $__git_subcommand_idx from __git_main()
Plug various leans reported by LSAN.
* ah/plugleaks:
builtin/rm: avoid leaking pathspec and seen
builtin/rebase: release git_format_patch_opt too
builtin/for-each-ref: free filter and UNLEAK sorting.
mailinfo: also free strbuf lists when clearing mailinfo
builtin/checkout: clear pending objects after diffing
builtin/check-ignore: clear_pathspec before returning
builtin/bugreport: don't leak prefixed filename
branch: FREE_AND_NULL instead of NULL'ing real_ref
bloom: clear each bloom_key after use
ls-files: free max_prefix when done
wt-status: fix multiple small leaks
revision: free remainder of old commit list in limit_list
"git rev-list" learns the "--filter=object:type=<type>" option,
which can be used to exclude objects of the given kind from the
packfile generated by pack-objects.
* ps/rev-list-object-type-filter:
rev-list: allow filtering of provided items
pack-bitmap: implement combined filter
pack-bitmap: implement object type filter
list-objects: implement object type filter
list-objects: support filtering by tag and commit
list-objects: move tag processing into its own function
revision: mark commit parents as NOT_USER_GIVEN
uploadpack.txt: document implication of `uploadpackfilter.allow`
"git rebase --[no-]reschedule-failed-exec" did not work well with
its configuration variable, which has been corrected.
* ab/rebase-no-reschedule-failed-exec:
rebase: don't override --no-reschedule-failed-exec with config
rebase tests: camel-case rebase.rescheduleFailedExec consistently
Dev support.
* ab/doc-lint:
docs: fix linting issues due to incorrect relative section order
doc lint: lint relative section order
doc lint: lint and fix missing "GIT" end sections
doc lint: fix bugs in, simplify and improve lint script
doc lint: Perl "strict" and "warnings" in lint-gitlink.perl
Documentation/Makefile: make doc.dep dependencies a variable again
Documentation/Makefile: make $(wildcard howto/*.txt) a var
"git add" and "git rm" learned not to touch those paths that are
outside of sparse checkout.
* mt/add-rm-in-sparse-checkout:
rm: honor sparse checkout patterns
add: warn when asked to update SKIP_WORKTREE entries
refresh_index(): add flag to ignore SKIP_WORKTREE entries
pathspec: allow to ignore SKIP_WORKTREE entries on index matching
add: make --chmod and --renormalize honor sparse checkouts
t3705: add tests for `git add` in sparse checkouts
add: include magic part of pathspec on --refresh error
Replace GIT_CONFIG_NOSYSTEM mechanism to decline from reading the
system-wide configuration file with GIT_CONFIG_SYSTEM that lets
users specify from which file to read the system-wide configuration
(setting it to an empty file would essentially be the same as
setting NOSYSTEM), and introduce GIT_CONFIG_GLOBAL to override the
per-user configuration in $HOME/.gitconfig.
* ps/config-global-override:
t1300: fix unset of GIT_CONFIG_NOSYSTEM leaking into subsequent tests
config: allow overriding of global and system configuration
config: unify code paths to get global config paths
config: rename `git_etc_config()`
"git log --format=..." placeholders learned %ah/%ch placeholders to
request the --date=human output.
* zh/pretty-date-human:
pretty: provide human date format
"git (branch|tag) --format=..." has been micro-optimized.
* zh/format-ref-array-optim:
ref-filter: reuse output buffer
ref-filter: get rid of show_ref_array_item
When we swapped the order of --3way fallback, we forgot to adjust
the message we give when the first method fails and the second
method is attempted (which used to be "direct application failed
hence we try 3way", now it is the other way around).
* jz/apply-3way-first-message-fix:
apply: adjust messages to account for --3way changes
When the reachability bitmap is in effect, the "do not lose
recently created objects and those that are reachable from them"
safety to protect us from races were disabled by mistake, which has
been corrected.
* jk/prune-with-bitmap-fix:
prune: save reachable-from-recent objects with bitmaps
pack-bitmap: clean up include_check after use
Tweak a few tests for "log --format=..." that show timestamps in
various formats.
* ab/pretty-date-format-tests:
pretty tests: give --date/format tests a better description
pretty tests: simplify %aI/%cI date format test
"git --config-env var=val cmd" weren't accepted (only
--config-env=var=val was).
* ps/config-env-option-with-separate-value:
git: support separate arg for `--config-env`'s value
git.txt: fix synopsis of `--config-env` missing the equals sign
The variable `matches` used to hold the return of a `dir_path_match()`
call that was removed in 95c11ecc73 ("Fix error-prone fill_directory()
API; make it only return matches", 2020-04-01). Now `matches` will
always hold 0, which is the value it's initialized with; and the
condition `matches != MATCHED_EXACTLY` will always evaluate to true. So
let's remove this unnecessary variable.
Interestingly, it seems that `matches != MATCHED_EXACTLY` was already
unnecessary before 95c11ecc73. That's because `remove_directories` is
always set to 1 when we have pathspecs; So, in the condition
`!remove_directories && matches != MATCHED_EXACTLY`, we would either:
- have pathspecs (or have been given `-d`) and ignore `matches` because
`remove_directories` is 1; or
- not have pathspecs (nor `-d`) and end up just checking that
`0 != MATCHED_EXACTLY`, as `matches` would never get reassigned
after its zero initialization (because there is no pathspec to match).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later change, mailinfo will learn more options, let's switch to our
robust parse_options framework before that step.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later change, we will use parse_option to parse mailinfo's options.
In mailinfo, both "-u", "-n", and "--encoding" try to set the same
field, with "-u" reset that field to some default value from
configuration variable "i18n.commitEncoding".
Let's delay the setting of that field until we finish processing all
options. By doing that, "i18n.commitEncoding" can be parsed on demand.
More importantly, it cleans the way for using parse_option.
This change introduces some inconsistent brackets "{}" in "if/else if"
construct, however, we will rewrite them in the next few changes.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interactive machinery does not obey --dry-run. Die appropriately
if both flags are passed.
Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the logic of the i18n functions I added in 5e9637c629 (i18n:
add infrastructure for translating Git with gettext, 2011-11-18) to
use pass-through functions when NO_GETTEXT is defined.
This speeds up the compilation time of commands that use this library
when NO_GETTEXT=Y is in effect. Loading it and POSIX.pm is around 20ms
on my machine, whereas it takes 2ms to just instantiate perl itself.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regenerate the *.pm files in perl/build/* if the
NO_PERL_CPAN_FALLBACKS flag added to the *.pm files in
1aca69c019 (perl Git::LoadCPAN: emit better errors under
NO_PERL_CPAN_FALLBACKS, 2018-03-03) is changed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the logic to generate perl/build/* to regenerate those files if
GIT-PERL-DEFINES changes. This ensures that e.g. changing localedir
will result in correctly re-generated files.
I don't think that ever worked. The brokenness pre-dates my
20d2a30f8f (Makefile: replace perl/Makefile.PL with simple make
rules, 2017-12-10).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 07d90eadb5 (Makefile: add Perl runtime prefix support,
2018-04-10) we have been declaring PERL_DEFINES right after assigning
to it, with the effect that the first PERL_DEFINES was ignored.
That bug didn't matter in practice since the first line had all the
same variables as the second, so we'd correctly re-generate
everything. It just made for confusing reading.
Let's remove that first assignment, and while we're at it split these
across lines to make them more maintainable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the definition of the structure around the open/close/read
functions introduced in 46bf043807 (streaming: a new API to read from
the object store, 2011-05-11) to instead populate "close" and "read"
members in the "struct git_istream".
This gets us rid of an extra pointer deference, and I think makes more
sense. The "close" and "read" functions are the primary interface to
the stream itself.
Let's also populate a "open" callback in the same struct. That's now
used by open_istream() after istream_source() decides what "open"
function should be used. This isn't needed to get rid of the
"stream_vtbl" variables, but makes sense for consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the streaming interface to stop passing around the "struct
object_info" the open() functions.
As seen in 7ef2d9a260 (streaming: read non-delta incrementally from a
pack, 2011-05-13) which introduced the "st->u.in_pack" assignments
being changed here only the open_istream_pack_non_delta() path need
these.
So let's instead do this when preparing the selected callback in the
istream_source() function. This might also allow the compiler to
reduce the lifetime of the "oi" variable, as we've moved it from
"git_istream()" to "istream_source()".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the {open,close,read}_method_decl() macros added in
46bf043807 (streaming: a new API to read from the object store,
2011-05-11) in favor of inlining the definition of the arguments of
these functions.
Since we'll end up using them via the "{open,close,read}_istream_fn"
types we don't gain anything in the way of compiler checking by using
these macros, and as of preceding commits we no longer need to declare
these argument lists twice. So declaring them at a distance just
serves to make the code less readable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the indirection of discovering a function pointer to use via an
enum and virtual table. This refactors code added in
46bf043807 (streaming: a new API to read from the object store,
2011-05-11).
We can instead simply return an "open_istream_fn" for use from the
"istream_source()" selector function directly. This allows us to get
rid of the "incore", "loose" and "pack_non_delta" enum
variables. We'll return the functions instead.
The "stream_error" variable in that enum can likewise go in favor of
returning NULL, which is what the open_istream() was doing when it got
that value anyway.
We can thus remove the entire enum, and the "open_istream_tbl" virtual
table that (indirectly) referenced it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change code added in 46bf043807 (streaming: a new API to read from
the object store, 2011-05-11) to avoid forward declarations of the
functions it uses. We can instead move this code to the bottom of the
file, and thus avoid the open_method_decl() calls.
Aside from the addition of the "static helpers[...]" comment being
added here, and the removal of the forward declarations this is a
move-only change.
The style of the added "static helpers[...]" comment isn't in line
with our usual coding style, but is consistent with several other
comments used in this file, so let's use that style consistently here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the set_index_sparse_config() function by folding it into
set_sparse_index_config(), which was its only user.
Since 122ba1f7b5 (sparse-checkout: toggle sparse index from builtin,
2021-03-30) the flow of this code hasn't made much sense, we'd get
"enabled" in set_sparse_index_config(), proceed to call
set_index_sparse_config() with it.
There we'd call prepare_repo_settings() and set
"repo->settings.sparse_index = 1", only to needlessly call
prepare_repo_settings() again in set_sparse_index_config() (where it
would early abort), and finally setting "repo->settings.sparse_index =
enabled".
Instead we can just call prepare_repo_settings() once, and set the
variable to "enabled" in the first place.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For every new branch that git-p4 imports, it needs to find the commit
where it branched off its parent branch. While p4 doesn't record this
information explicitly, the first changelist on a branch is usually an
identical copy of the parent branch.
The method searchParent() tries to find a commit in the history of the
given "parent" branch whose tree exactly matches the initial changelist
of the new branch, "target". The code iterates through the parent
commits and compares each of them to this initial changelist using
diff-tree.
Since we already know the tree object name we are looking for, spawning
diff-tree for each commit is wasteful.
Use the "--format" option of "rev-list" to find out the tree object name
of each commit in the history, and find the tree whose name is exactly
the same as the tree of the target commit to optimize this.
This results in a considerable speed-up, at least on Windows. On one
Windows machine with a fairly large repository of about 16000 commits in
the parent branch, the current code takes over 7 minutes, while the new
code only takes just over 10 seconds for the same changelist:
Before:
$ time git p4 sync
Importing from/into multiple branches
Depot paths: //depot
Importing revision 31274 (100.0%)
Updated branches: b1
real 7m41.458s
user 0m0.000s
sys 0m0.077s
After:
$ time git p4 sync
Importing from/into multiple branches
Depot paths: //depot
Importing revision 31274 (100.0%)
Updated branches: b1
real 0m10.235s
user 0m0.000s
sys 0m0.062s
Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When importing a branch from p4, git-p4 searches the history of the parent
branch for the branch point. The test for the complex branch structure
ensures all files have the expected contents, but doesn't examine the
branch structure.
Check for the correct branch structure by making sure that the initial
commit on each branch is empty. This ensures that the initial commit's
parent is indeed the correct branch-off point.
Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_prepare_env() calls xdl_classify_record() which arranges for the
hashes of non-matching lines to be different so lines can be tested
for equality by comparing just their hashes.
This reduces the time taken to calculate the diff of v2.28.0 to
v2.29.0 by ~3-4%.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.
Commit bf82940dbf (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.
One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have tests for the basic parallel-checkout operations. But
this code can also run be executed by other commands, such as
git-read-tree and git-sparse-checkout, which are currently not tested
with multiple workers. To promote a wider test coverage without
duplicating tests:
1. Add the GIT_TEST_CHECKOUT_WORKERS environment variable, to optionally
force parallel-checkout execution during the whole test suite.
2. Set this variable (with a value of 2) in the second test round of our
linux-gcc CI job. This round runs `make test` again with some
optional GIT_TEST_* variables enabled, so there is no additional
overhead in exercising the parallel-checkout code here.
Note that tests checking out less than two parallel-eligible entries
will fall back to the sequential mode. Nevertheless, it's still a good
exercise for the parallel-checkout framework as the fallback codepath
also writes the queued entries using the parallel-checkout functions
(only without spawning any worker).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.
Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests to confirm that the `struct conv_attrs` data is correctly
passed from the main process to the workers, and that they can properly
convert the blobs before writing them to the working tree.
Also check that parallel-ineligible entries, such as regular files that
require external filters, are correctly smudge and written when
parallel-checkout is enabled.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow checkout-index to use the parallel checkout framework, honoring
the checkout.workers configuration.
There are two code paths in checkout-index which call
`checkout_entry()`, and thus, can make use of parallel checkout:
`checkout_file()`, which is used to write paths explicitly given at the
command line; and `checkout_all()`, which is used to write all paths in
the index, when the `--all` option is given.
In both operation modes, checkout-index doesn't abort immediately on a
`checkout_entry()` failure. Instead, it tries to check out all remaining
paths before exiting with a non-zero exit code. To keep this behavior
when parallel checkout is being used, we must allow
`run_parallel_checkout()` to try writing the queued entries before we
exit, even if we already got an error code from a previous
`checkout_entry()` call.
However, `checkout_all()` doesn't return on errors, it calls `exit()`
with code 128. We could make it call `run_parallel_checkout()` before
exiting, but it makes the code easier to follow if we unify the exit
path for both checkout-index modes at `cmd_checkout_index()`, and let
this function take care of the interactions with the parallel checkout
API. So let's do that.
With this change, we also have to consider whether we want to keep using
128 as the error code for `git checkout-index --all`, while we use 1 for
`git checkout-index <path>` (even when the actual error is the same).
Since there is not much value in having code 128 only for `--all`, and
there is no mention about it in the docs (so it's unlikely that changing
it will break any existing script), let's make both modes exit with code
1 on `checkout_entry()` errors.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following patch will add tests outside t0028 which will also need to
re-encode some strings. Extract the auxiliary encoding functions from
t0028 to a common lib file so that they can be reused.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests to confirm that path collisions are properly detected by
checkout workers, both to avoid race conditions and to report colliding
entries on clone.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
a code path that doesn't yet support parallel checkout because it calls
checkout_entry() directly, instead of unpack_trees(). Let's add parallel
checkout support for this code path too.
The transient cache entries allocated in checkout_merged() are now
allocated in a mem_pool which is only discarded after parallel checkout
finishes. This is done because the entries need to be valid when
run_parallel_checkout() is called.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A HTTP-clone test introduced in 4fe788b1b0 ("builtin/clone.c: add
--reject-shallow option", 2021-04-01) only works in protocol v2, but is
not marked as such.
The aforementioned patch implements --reject-shallow for a variety of
situations, but usage of a protocol that requires a remote helper is not
one of them. (Such an implementation would require extending the remote
helper protocol to support the passing of a "reject shallow" option, and
then teaching it to both protocol-speaking ends.)
For now, to make it pass when GIT_TEST_PROTOCOL_VERSION=0 is passed, add
"-c protocol.version=2". A more complete solution would be either to
augment the remote helper protocol to support this feature or to return
a fatal error when using --reject-shallow with a protocol that uses a
remote helper.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new testcase where one side of history renames:
olddir/ -> newdir/
and the other side of history renames:
olddir/a -> olddir/alpha
When using merge.directoryRenames=true, it seems logical to expect the
file to end up at newdir/alpha. Unfortunately, both merge-recursive and
merge-ort currently see this as a rename/rename conflict:
olddir/a -> newdir/a
vs.
olddir/a -> newdir/alpha
Suggesting that there's some extra logic we probably want to add
somewhere to allow this case to run without triggering a conflict. For
now simply document this known issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[PATCH]: contrib/completion/git-completion.bash, there is a construct
where comment lines are placed between the command that is on
the upstream of a pipe and the command that is on the downstream
of a pipe in __git_complete_worktree_paths function.
Unfortunately, this script is also used by Zsh completion, but
Zsh mishandles this construct when "interactive_comments" option is not
set (by default it is off on macOS), resulting in a breakage:
$ git worktree remove [TAB]
$ git worktree remove __git_complete_worktree_paths:7: command not found: #
Move the comment, even though it explains what happens on the
downstream of the pipe and logically belongs where it is right
now, before the entire pipeline, to work around this problem.
Signed-off-by: Sardorbek Imomaliev <sardorbek.imomaliev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `trailer.<token>.command` configuration variable
specifies a command (run via the shell, so it does not have
to be a single name or path to the command, but can be a
shell script), and the first occurrence of substring $ARG is
replaced with the value given to the `interpret-trailer`
command for the token in a '--trailer <token>=<value>' argument.
This has three downsides:
* The use of $ARG in the mechanism misleads the users that
the value is passed in the shell variable, and tempt them
to use $ARG more than once, but that would not work, as
the second and subsequent $ARG are not replaced.
* Because $ARG is textually replaced without regard to the
shell language syntax, even '$ARG' (inside a single-quote
pair), which a user would expect to stay intact, would be
replaced, and worse, if the value had an unmatched single
quote (imagine a name like "O'Connor", substituted into
NAME='$ARG' to make it NAME='O'Connor'), it would result in
a broken command that is not syntactically correct (or
worse).
* The first occurrence of substring `$ARG` will be replaced
with the empty string, in the command when the command is
first called to add a trailer with the specified <token>.
This is a bad design, the nature of automatic execution
causes it to add a trailer that we don't expect.
Introduce a new `trailer.<token>.cmd` configuration that
takes higher precedence to deprecate and eventually remove
`trailer.<token>.command`, which passes the value as an
argument to the command. Instead of "$ARG", users can
refer to the value as positional argument, $1, in their
scripts. At the same time, in order to allow
`git interpret-trailers` to better simulate the behavior
of `git command -s`, 'trailer.<token>.cmd' will not
automatically execute.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the original documentation of `trailer.<token>.command`,
some descriptions are easily misunderstood. So let's modify
it to increase its readability.
In addition, clarify that `$ARG` in command can only be
replaced once.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We stopped allowing symlinks for .gitmodules files in 10ecfa7649
(verify_path: disallow symlinks in .gitmodules, 2018-05-04), and we
stopped following symlinks for .gitattributes, .gitignore, and .mailmap
in the commits from 204333b015 (Merge branch 'jk/open-dotgitx-with-nofollow',
2021-03-22). The reasons are discussed in detail there, but we never
adjusted the documentation to let users know.
This hasn't been a big deal since the point is that such setups were
mildly broken and thought to be unusual anyway. But it certainly doesn't
hurt to be clear and explicit about it.
Suggested-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the commits merged in via 204333b015 (Merge branch
'jk/open-dotgitx-with-nofollow', 2021-03-22), we stopped following
symbolic links for .gitattributes, .gitignore, and .mailmap files.
Let's teach fsck to warn that these symlinks are not going to do
anything. Note that this is just a warning, and won't block the objects
via transfer.fsckObjects, since there are reported to be cases of this
in the wild (and even once fixed, they will continue to exist in the
commit history of those projects, but are not particularly dangerous).
Note that we won't add these to the existing gitmodules block in the
fsck code. The logic for gitmodules is a bit more complicated, as we
also check the content of non-symlink instances we find. But for these
new files, there is no content check; we're just looking at the name and
mode of the tree entry (and we can avoid even the complicated name
checks in the common case that the mode doesn't indicate a symlink).
We can reuse the test helper function we defined for .gitmodules, though
(it needs some slight adjustments for the fsck error code, and because
we don't block these symlinks via verify_path()).
Note that I didn't explicitly test the transfer.fsckObjects case here
(nor does the existing .gitmodules test that it blocks a push). The
translation of fsck severities to outcomes is covered in general in
t5504.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).
Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).
I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.
Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t7450 we check that both verify_path() and fsck catch malformed
.gitmodules entries in trees. However, we don't check that we catch
filesystem-equivalent forms of these (e.g., ".GITMOD~1" on Windows).
Our name-matching functions are exercised well in t0060, but there's
nothing to test that we correctly call the matching functions from the
actual fsck and verify_path() code.
So instead of testing just .gitmodules, let's repeat our tests for a few
basic cases. We don't need to be exhaustive here (t0060 handles that),
but just make sure we hit one name of each type.
Besides pushing the tests into a function that takes the path as a
parameter, we'll need to do a few things:
- adjust the directory name to accommodate the tests running multiple
times
- set core.protecthfs for index checks. Fsck always protects all types
by default, but we want to be able to exercise the HFS routines on
every system. Note that core.protectntfs is already the default
these days, but it doesn't hurt to explicitly label our need for it.
- we'll also take the filename ("gitmodules") as a parameter. All
calls use the same name for now, but a future patch will extend this
to handle other .gitfoo files. Note that our fake-content symlink
destination is somewhat .gitmodules specific. But it isn't necessary
for other files (which don't do a content check). And it happens to
be a valid attribute and ignore file anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) made it impossible to load a symlink .gitmodules file into
the index. However, there are no tests of this behavior. Let's make sure
this case is covered. We can easily reuse the test setup created by
the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
2018-05-04).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This script has already expanded beyond its original intent of ".. in
submodule names" to include other malicious submodule bits. Let's update
the name and description to reflect that, as well as the fact that we'll
soon be adding similar tests for other dotfiles (.gitattributes, etc).
We'll also renumber it to move it out of the group of submodule-specific
tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many calls to report() in fsck_tree() are kept on a single line and are
quite long. Most were pretty big to begin with, but have gotten even
longer over the years as we've added more parameters. Let's accept the
churn of wrapping them in order to conform to our usual line limits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).
Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.
Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop were already correct but need to be touched), but makes sure we
catch all cases and will avoid similar confusion in the future.
And we can update the test to be a bit more specific and catch this
problem.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since GETTEXT_POISON does not exist anymore, there is no point warning
people about whether we should use test_i18ngrep. This is doubly
confusing because the comment was describing why it was OK to use grep,
but it got caught up in the mass conversion of 674ba34038 (fsck: mark
strings for translation, 2018-11-10).
Note there are other uses of test_i18ngrep in this script which are now
obsolete; I'll save those for a mass-cleanup. My goal here was just to
fix the confusing comment in code I'm about to refactor.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Format-patch doesn't have a way to format merges in a way that can be
applied by git-am (or any other tool), and so it just omits them.
However, this may be a surprising implication for users who are not well
versed in how the tool works. Let's add a note to the documentation
making this more clear.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A negative delta depth makes no sense, and the code is not prepared to
handle it. If passed "--depth=-1" on the command line, then this line
from break_delta_chains():
cur->depth = (total_depth--) % (depth + 1);
triggers a divide-by-zero. This is undefined behavior according to the C
standard, but on POSIX systems results in SIGFPE killing the process.
This is certainly one way to inform the use that the command was
invalid, but it's a bit friendlier to just treat it as "don't allow any
deltas", which we already do for --depth=0.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd expect this to cleanly produce no deltas at all (as opposed to
getting confused by an out-of-bounds value), and it does.
Note we have to adjust our max_chain test helper, which expected to find
at least one delta.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A negative window size makes no sense, and the code in find_deltas() is
not prepared to handle it. If you pass "-1", for example, we end up
generate a 0-length array of "struct unpacked", but our loop assumes it
has at least one entry in it (and we end up reading garbage memory).
We could complain to the user about this, but it's more forgiving to
just clamp it to 0, which means "do not find any deltas at all". The
0-case is already tested earlier in the script, so we'll make sure this
does the same thing.
Reported-by: Yiyuan guo <yguoaz@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pack a set of objects both with and without --window=0, assuming that
the 0-length window will cause us not to produce any deltas. Let's
confirm that this is the case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first set of tests in t5300 goes back to 2005, and doesn't use some
of our customary style and tools these days. In preparation for touching
them, let's modernize a few things:
- titles go on the line with test_expect_success, with a hanging
open-quote to start the test body
- test bodies should be indented with tabs
- opening braces for shell blocks in &&-chains go on their own line
- no space between redirect operators and files (">foo", not "> foo")
- avoid doing work outside of test blocks; in this case, we can stick
the setup of ".git2" into the appropriate blocks
- avoid modifying and then cleaning up the environment or current
directory by using subshells and "git -C"
- this test does a curious thing when testing the unpacking: it sets
GIT_OBJECT_DIRECTORY, and then does a "git init" in the _original_
directory, creating a weird mixed situation. Instead, it's much
simpler to just "git init --bare" a new repository to unpack into,
and check the results there. I renamed this "git2" instead of
".git2" to make it more clear it's a separate repo.
- we can observe that the bodies of the no-delta, ref_delta, and
ofs_delta cases are all virtually identical except for the pack
creation, and factor out shared helper functions. I collapsed "do
the unpack" and "check the results of the unpack" into a single
test, since that makes the expected lifetime of the "git2" temporary
directory more clear (that also lets us use test_when_finished to
clean it up). This does make the "-v" output slightly less useful,
but the improvement in reading the actual test code makes it worth
it.
- I dropped the "pwd" calls from some tests. These don't do anything
functional, and I suspect may have been an aid for debugging when
the script was more cavalier about leaving the working directory
changed between tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
01d3a526 (t0000: check whether the shell supports the "local"
keyword, 2017-10-26) raised a test balloon to see if those who build
and test Git use a platform with a shell that lacks support for the
"local" keyword. After two years, 7f0b5908 (t0000: reword comments
for "local" test, 2019-08-08) documented that "local" keyword, even
though is outside POSIX, is allowed in our test scripts.
Let's write it in the CodingGuidelines, too. It might be tempting
to allow it in scripted Porcelains (we have avoided getting them
contaminiated by "local" so far), but they are on their way out and
getting rewritten in C.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rewrite of git-merge from shell to C in 1c7b76be7d (Build in merge,
2008-07-07) accidentally transformed the message:
Already up-to-date. (nothing to squash)
to:
(nothing to squash)Already up-to-date.
due to reversed printf() arguments. This problem has gone unnoticed
despite being touched over the years by 7f87aff22c (Teach/Fix pull/fetch
-q/-v options, 2008-11-15) and bacec47845 (i18n: git-merge basic
messages, 2011-02-22), and tangentially by bef4830e88 (i18n: merge: mark
messages for translation, 2016-06-17) and 7560f547e6 (treewide: correct
several "up-to-date" to "up to date", 2017-08-23).
Fix it by restoring the message to its intended order. While at it, help
translators out by avoiding "sentence Lego".
[es: rewrote commit message]
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although the various "Already up to date" messages resulting from merge
attempts share identical phrasing, they use a mix of punctuation ranging
from "." to "!" and even "Yeeah!", which leads to extra work for
translators. Ease the job of translators by settling upon "." as
punctuation for all such messages.
While at it, take advantage of printf_ln() to further ease the
translation task so translators need not worry about line termination,
and fix a case of missing line termination in the (unused)
merge_ort_nonrecursive() function.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as
$ git submodule update --quiet --init --depth=1
involving shallow clones, call the shell function fetch_in_submodule, which
in turn invokes git fetch. Pass the --quiet option onward there.
Signed-off-by: Nicholas Clark <nick@ccl4.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify that pathnames recorded in Git trees are most often (but
not necessarily) encoded in UTF-8.
* ab/pathname-encoding-doc:
doc: clarify the filename encoding in git diff
Effort to make the command line completion (in contrib/) safe with
"set -u" continues.
* vs/completion-with-set-u:
completion: avoid aliased command lookup error in nounset mode
Show errno in the trace output in the error codepath that calls
read_raw_ref method.
* hn/refs-trace-errno:
refs: print errno for read_raw_ref if GIT_TRACE_REFS is set
The checkout machinery has been taught to perform the actual
write-out of the files in parallel when able.
* mt/parallel-checkout-part-2:
parallel-checkout: add design documentation
parallel-checkout: support progress displaying
parallel-checkout: add configuration options
parallel-checkout: make it truly parallel
unpack-trees: add basic support for parallel checkout
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
The prefetch task in "git maintenance" assumed that "git fetch"
from any remote would fetch all its local branches, which would
fetch too much if the user is interested in only a subset of
branches there.
* ds/maintenance-prefetch-fix:
maintenance: respect remote.*.skipFetchAll
maintenance: use 'git fetch --prefetch'
fetch: add --prefetch option
maintenance: simplify prefetch logic
"git push --quiet --set-upstream" was not quiet when setting the
upstream branch configuration, which has been corrected.
* ow/push-quiet-set-upstream:
transport: respect verbosity when setting upstream
When packet_write() fails, we gave an extra error message
unnecessarily, which has been corrected.
* mt/pkt-write-errors:
pkt-line: do not report packet write errors twice
Handling of "promisor packs" that allows certain objects to be
missing and lazily retrievable has been optimized (a bit).
* jk/promisor-optim:
revision: avoid parsing with --exclude-promisor-objects
lookup_unknown_object(): take a repository argument
is_promisor_object(): free tree buffer after parsing
Commit e4c7b33747 ("bisect--helper: reimplement `bisect_skip` shell
function in C", 2021-02-03), as part of the shell-to-C conversion,
forgot to read the 'terms' file (.git/BISECT_TERMS) during the new
'bisect skip' command implementation. As a result, the 'bisect skip'
command will use the default 'bad'/'good' terms. If the bisection
terms have been set to non-default values (for example by the
'bisect start' command), then the 'bisect skip' command will fail.
In order to correct this problem, we insert a call to the get_terms()
function, which reads the non-default terms from that file (if set),
in the '--bisect-skip' command implementation of 'bisect--helper'.
Also, add a test[1] to protect against potential future regression.
[1] https://lore.kernel.org/git/xmqqim45h585.fsf@gitster.g/T/#m207791568054b0f8cf1a3942878ea36293273c7d
Reported-by: Trygve Aaberge <trygveaa@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The backslash character is not a valid part of a file name on Windows.
If, in Windows, Git attempts to write a file that has a backslash
character in the filename, it will be incorrectly interpreted as a
directory separator.
This caused CVE-2019-1354 in MinGW, as this behaviour can be manipulated
to cause the checkout to write to files it ought not write to, such as
adding code to the .git/hooks directory. This was fixed by e1d911dd4c
(mingw: disallow backslash characters in tree objects' file names,
2019-09-12). However, the vulnerability also exists in Cygwin: while
Cygwin mostly provides a POSIX-like path system, it will still interpret
a backslash as a directory separator.
To avoid this vulnerability, CVE-2021-29468, extend the previous fix to
also apply to Cygwin.
Similarly, extend the test case added by the previous version of the
commit. The test suite doesn't have an easy way to say "run this test
if in MinGW or Cygwin", so add a new test prerequisite that covers both.
As well as checking behaviour in the presence of paths containing
backslashes, the existing test also checks behaviour in the presence of
paths that differ only by the presence of a trailing ".". MinGW follows
normal Windows application behaviour and treats them as the same path,
but Cygwin more closely emulates *nix systems (at the expense of
compatibility with native Windows applications) and will create and
distinguish between such paths. Gate the relevant bit of that test
accordingly.
Reported-by: RyotaK <security@ryotak.me>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While not documented as such, many of the top-level options like
`--git-dir` and `--work-tree` support two syntaxes: they accept both an
equals sign between option and its value, and they do support option and
value as two separate arguments. The recently added `--config-env`
option only supports the syntax with an equals sign.
Mitigate this inconsistency by accepting both syntaxes and add tests to
verify both work.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing `git -h`, then the `--config-env` documentation rightly
lists the option as requiring an equals between the option and its
argument: this is the only currently supported format. But the git(1)
manpage incorrectly lists the option as taking a space in between.
Fix the issue by adding the missing space.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-of-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" specifically calls out when it is falling back to 3way
merge application. Since the order changed to preferring 3way and
falling back to direct application, continue that behavior by
printing whenever 3way fails and git has to fall back.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass our prune expiration to mark_reachable_objects(), which will
traverse not only the reachable objects, but consider any recent ones as
tips for reachability; see d3038d22f9 (prune: keep objects reachable
from recent objects, 2014-10-15) for details.
However, this interacts badly with the bitmap code path added in
fde67d6896 (prune: use bitmaps for reachability traversal, 2019-02-13).
If we hit the bitmap-optimized path, we return immediately to avoid the
regular traversal, accidentally skipping the "also traverse recent"
code.
Instead, we should do an if-else for the bitmap versus regular
traversal, and then follow up with the "recent" traversal in either
case. This reuses the "rev_info" for a bitmap and then a regular
traversal, but that should work OK (the bitmap code clears the pending
array in the usual way, just like a regular traversal would).
Note that I dropped the comment above the regular traversal here. It
has little explanatory value, and makes the if-else logic much harder to
read.
Here are a few variants that I rejected:
- it seems like both the reachability and recent traversals could be
done in a single traversal. This was rejected by d3038d22f9 (prune:
keep objects reachable from recent objects, 2014-10-15), though the
balance may be different when using bitmaps. However, there's a
subtle correctness issue, too: we use revs->ignore_missing_links for
the recent traversal, but not the reachability one.
- we could try using bitmaps for the recent traversal, too, which
could possibly improve performance. But it would require some fixes
in the bitmap code, which uses ignore_missing_links for its own
purposes. Plus it would probably not help all that much in practice.
We use the reachable tips to generate bitmaps, so those objects are
likely not covered by bitmaps (unless they just became unreachable).
And in general, we expect the set of unreachable objects to be much
smaller anyway, so there's less to gain.
The test in t5304 detects the bug and confirms the fix.
I also beefed up the tests in t6501, which covers the mtime-checking
code more thoroughly, to handle the bitmap case (in addition to just
"loose" and "packed" cases). Interestingly, this test doesn't actually
detect the bug, because it is running "git gc", and not "prune"
directly. And "gc" will call "repack" first, which does not suffer the
same bug. So the old-but-reachable-from-recent objects get scooped up
into the new pack along with the actually-recent objects, which gives
both a recent mtime. But it seemed prudent to get more coverage of the
bitmap case for related code.
Reported-by: David Emett <dave@sp4m.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a bitmap walk has to traverse (to fill in non-bitmapped objects),
we use rev_info's include_check mechanism to let us stop the traversal
early. But after setting the function and its data parameter, we never
clean it up. This means that if the rev_info is used for a subsequent
traversal without bitmaps, it will unexpectedly call into our
include_check function (worse, it will do so pointing to a now-defunct
stack variable in include_check_data, likely resulting in a segfault).
There's no code which does this now, but it's an accident waiting to
happen. Let's clean up after ourselves in the bitmap code.
Reported-by: David Emett <dave@sp4m.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't silently ignore a flag that's invalid for a given subcommand. The
user expected it to do something; we should tell the user that they are
mistaken, instead of surprising the user.
It could be argued that this change might break existing users. I'd
argue that those existing users are already broken, and they just don't
know it. Let them know that they're broken.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git subtree split' lets you specify a rev other than HEAD. 'git push'
lets you specify a mapping between a local thing and a remot ref. So
smash those together, and have 'git subtree push' let you specify which
local thing to run split on and push the result of that split to the
remote ref.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'push' does a 'split' internally, but it doesn't pass flags through to the
'split'. This is silly, if you need to pass flags to 'split', then it
means that you can't use 'push'!
So, have 'push' accept 'split' flags, and pass them through to 'split'.
Add tests for this by copying split's tests with minimal modification.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Besides being a genuinely useful thing to do, this also just makes sense
and harmonizes which flags may be used when. `git subtree split
--rejoin` amounts to "automatically go ahead and do a `git subtree
merge` after doing the main `git subtree split`", so it's weird and
arbitrary that you can't pass `--squash` to `git subtree split --rejoin`
like you can `git subtree merge`. It's weird that `git subtree split
--rejoin` inherits `git subtree merge`'s `--message` but not `--squash`.
Reconcile the situation by just having `split --rejoin` actually just
call `merge` internally (or call `add` instead, as appropriate), so it
can get access to the full `merge` behavior, including `--squash`.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just went through the docs looking for anything inaccurate or that can
be improved.
In the '-h' text, in the man page synopsis, and in the man page
description: Normalize the ordering of the list of sub-commands: 'add',
'merge', 'split', 'pull', 'push'. This allows us to kinda separate the
lower-level add/merge/split from the higher-level pull/push.
'-h' text:
- correction: Indicate that split's arg is optional.
- clarity: Emphasize that 'pull' takes the 'add'/'merge' flags.
man page:
- correction: State that all subcommands take options (it seemed to
indicate that only 'split' takes any options other than '-P').
- correction: 'split' only guarantees that the results are identical if
the flags are identical.
- correction: The flag is named '--ignore-joins', not '--ignore-join'.
- completeness: Clarify that 'push' always operates on HEAD, and that
'split' operates on HEAD if no local commit is given.
- clarity: In the description, when listing commands, repeat what their
arguments are. This way the reader doesn't need to flip back and
forth between the command description and the synopsis and the full
description to understand what's being said.
- clarity: In the <variables> used to give command arguments, give
slightly longer, descriptive names. Like <local-commit> instead of
just <commit>.
- clarity: Emphasize that 'pull' takes the 'add'/'merge' flags.
- style: In the synopsis, list options before the subcommand. This
makes things line up and be much more readable when shown
non-monospace (such as in `make html`), and also more closely matches
other man pages (like `git-submodule.txt`).
- style: Use the correct syntax for indicating the options ([<options>]
instead of [OPTIONS]).
- style: In the synopsis, separate 'pull' and 'push' from the other
lower-level commands. I think this helps readability.
- style: Code-quote things in prose that seem like they should be
code-quoted, like '.gitmodules', flags, or full commands.
- style: Minor wording improvements, like more consistent mood (many
of the command descriptions start in the imperative mood and switch
to the indicative mode by the end). That sort of thing.
- style: Capitalize "ID".
- style: Remove the "This option is only valid for XXX command" remarks
from each option, and instead rely on the section headings.
- style: Since that line is getting edited anyway, switch "behaviour" to
American "behavior".
- style: Trim trailing whitespace.
`todo`:
- style: Trim trailing whitespace.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the $indent variable is just used to track how deeply we're
nested, and the debug log is indented by things like
debug " foo"
That is: The indentation-level is hard-coded. It used to be that the
code couldn't recurse, so the indentation level could be known
statically, so it made sense to just hard-code it in the
output. However, since 315a84f9aa ("subtree: use commits before rejoins
for splits", 2018-09-28), it can now recurse, and the debug log is
misleading.
So fix that. Indent according to $indent.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, debug output (triggered by passing '-d') and progress output
stomp on each other. The debug output is just streamed as lines to
stderr, and the progress output is sent to stderr as '%s\r'. When
writing to a file, it is awkward to read and difficult to distinguish
between the debug output and a progress line. When writing to a
terminal the debug lines hide progress lines.
So, when '-d' has been passed, spit out progress as 'progress: %s\n',
instead of as '%s\r', so that it can be detected, and so that the debug
lines don't overwrite the progress when written to a terminal.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each function in subtree, add a usage comment saying what the
arguments are, and add an `assert` checking the number of arguments.
In figuring out each thing's arguments in order to write those comments
and assertions, it turns out that find_existing_splits is written as if
it takes multiple 'revs', but it is in fact only ever passed a single
'rev':
unrevs="$(find_existing_splits "$dir" "$rev")" || exit $?
So go ahead and codify that by documenting and asserting that it takes
exactly two arguments, one dir and one rev.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`cmd_add` starts with a check that the directory doesn't yet exist.
However, the `main` function performs the exact same check before
calling `cmd_add`. So remove the check from `cmd_add`.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main argument parser goes ahead and tries to parse revs to make
things simpler for the sub-command implementations. But, it includes
enough special cases for different sub-commands. And it's difficult
having having to think about "is this info coming from an argument, or a
global variable?". So the main argument parser's effort to make things
"simpler" ends up just making it more confusing and complicated.
Begone with the 'revs' global variable; parse 'rev=$(...)' as needed in
individual 'cmd_*' functions.
Begone with the 'default' global variable. Its would-be value is
knowable just from which function we're in.
Begone with the 'ensure_single_rev' function. Its functionality can be
achieved by passing '--verify' to 'git rev-parse'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They are synonyms. Both are used in the file. ^{commit} is clearer, so
"standardize" on that.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Scripts needing to fuss with with adding $(git --exec-prefix) PATH
before loading git-sh-setup is a thing of the past. As far as I can
tell, it's been a thing of the past since since Git v1.2.0 (2006-02-12),
or more specifically, since 77cb17e940 (Exec git programs without using
PATH, 2006-01-10). However, it stuck around in contrib scripts and in
third-party scripts for long enough that it wasn't unusual to see.
Originally `git subtree` didn't fuss with PATH, but when people
(including the original subtree author) had problems, because it was a
common thing to see, it seemed that having subtree fuss with PATH was a
reasonable solution.
Here is an abridged history of fussing with PATH in subtree:
2987e6add3 (Add explicit path of git installation by 'git --exec-path', Gianluca Pacchiella, 2009-08-20)
As pointed out by documentation, the correct use of 'git-sh-setup' is
using $(git --exec-path) to avoid problems with not standard
installations.
-. git-sh-setup
+. $(git --exec-path)/git-sh-setup
33aaa697a2 (Improve patch to use git --exec-path: add to PATH instead, Avery Pennarun, 2009-08-26)
If you (like me) are using a modified git straight out of its source
directory (ie. without installing), then --exec-path isn't actually correct.
Add it to the PATH instead, so if it is correct, it'll work, but if it's
not, we fall back to the previous behaviour.
-. $(git --exec-path)/git-sh-setup
+PATH=$(git --exec-path):$PATH
+. git-sh-setup
9c632ea29c ((Hopefully) fix PATH setting for msysgit, Avery Pennarun, 2010-06-24)
Reported by Evan Shaw. The problem is that $(git --exec-path) includes a
'git' binary which is incompatible with the one in /usr/bin; if you run it,
it gives you an error about libiconv2.dll.
+OPATH=$PATH
PATH=$(git --exec-path):$PATH
. git-sh-setup
+PATH=$OPATH # apparently needed for some versions of msysgit
df2302d774 (Another fix for PATH and msysgit, Avery Pennarun, 2010-06-24)
Evan Shaw tells me the previous fix didn't work. Let's use this one
instead, which he says does work.
This fix is kind of wrong because it will run the "correct" git-sh-setup
*after* the one in /usr/bin, if there is one, which could be weird if you
have multiple versions of git installed. But it works on my Linux and his
msysgit, so it's obviously better than what we had before.
-OPATH=$PATH
-PATH=$(git --exec-path):$PATH
+PATH=$PATH:$(git --exec-path)
. git-sh-setup
-PATH=$OPATH # apparently needed for some versions of msysgit
First of all, I disagree with Gianluca's reading of the documentation:
- I haven't gone back to read what the documentation said in 2009, but
in my reading of the 2021 documentation is that it includes "$(git
--exec-path)/" in the synopsis for illustrative purposes, not to say
it's the proper way.
- After being executed by `git`, the git exec path should be the very
first entry in PATH, so it shouldn't matter.
- None of the scripts that are part of git do it that way.
But secondly, the root reason for fussing with PATH seems to be that
Avery didn't know that he needs to set GIT_EXEC_PATH if he's going to
use git from the source directory without installing.
And finally, Evan's issue is clearly just a bug in msysgit. I assume
that msysgit has since fixed the issue, and also msysgit has been
deprecated for 6 years now, so let's drop the workaround for it.
So, remove the line fussing with PATH. However, since subtree *is* in
'contrib/' and it might get installed in funny ways by users
after-the-fact, add a sanity check to the top of the script, checking
that it is installed correctly.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"$*" is for when you want to concatenate the args together,
whitespace-separated; and "$@" is for when you want them to be separate
strings.
There are several places in subtree that erroneously use $@ when
concatenating args together into an error message.
For instance, if the args are argv[1]="dead" and argv[2]="beef", then
the line
die "You must provide exactly one revision. Got: '$@'"
surely intends to call 'die' with the argument
argv[1]="You must provide exactly one revision. Got: 'dead beef'"
however, because the line used $@ instead of $*, it will actually call
'die' with the arguments
argv[1]="You must provide exactly one revision. Got: 'dead"
argv[2]="beef'"
This isn't a big deal, because 'die' concatenates its arguments together
anyway (using "$*"). But that doesn't change the fact that it was a
mistake to use $@ instead of $*, even though in the end $@ still ended
up doing the right thing.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it painfully obvious when reading the code which variables are
direct parsings of command line arguments.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
subtree currently defines its own `say` implementation, rather than
using git-sh-setups's implementation. Change that, don't re-invent the
wheel.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing a slow `rev_is_descendant_of_branch $a $b` function
in shell, just use the fast `git merge-base --is-ancestor $b $a`.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suport for Git versions older than 1.7.0 (older than February 2010) was
nice to have when git-subtree lived out-of-tree. But now that it lives
in git.git, it's not necessary to keep around. While it's technically
in contrib, with the standard 'git' packages for common systems
(including Arch Linux and macOS) including git-subtree, it seems
vanishingly likely to me that people are separately installing
git-subtree from git.git alongside an older 'git' install (although it
also seems vanishingly likely that people are still using >11 year old
git installs).
Not that there's much reason to remove it either, it's not much code,
and none of my changes depend on a newer git (to my knowledge, anyway;
I'm not actually testing against older git). I just figure it's an easy
piece of fat to trim, in the journey to making the whole thing easier to
hack on.
"Ignore space change" is probably helpful when viewing this diff.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure that every $(subshell) that calls a function (as opposed to an
external executable) is followed by `|| exit $?`. Similarly, ensure that
every `cmd | while read; do ... done` loop is followed by `|| exit $?`.
Both of those constructs mean that it can miss `die` calls, and keep
running when it shouldn't.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shove all of the loose code inside of a main() function.
This comes down to personal preference more than anything else. A
preference that I've developed over years of maintaining large Bash
scripts, but still a mere personal preference.
In this specific case, it's also moving the `set -- -h`, the `git
rev-parse --parseopt`, and the `. git-sh-setup` to be closer to all
the rest of the argument parsing, which is a readability win on its
own, IMO.
"Ignore space change" is probably helpful when viewing this diff.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'pull' and 'push' subcommands deserve their own sections in the tests.
Add some basic tests for them.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a dumb test, but it's surprisingly easy to break.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7900-subtree.sh defines a helper function named last_commit_message.
However, it only returns the subject line of the commit message, not the
entire commit message. So rename it, to make the name less confusing.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As far as I can tell, this test isn't actually testing anything, because
someone forgot to tack on `--name-only` to `git log`. This seems to
have been the case since the test was first written, back in fa16ab36ad
("test.sh: make sure no commit changes more than one file at a time.",
2009-04-26), unless `git log` used to do that by default and didn't need
the flag back then?
Convincing myself that it's not actually testing anything was tricky,
the code is a little hard to reason about. It can be made a lot simpler
if instead of trying to parse all of the info from a single `git log`,
we're OK calling `git log` from inside of a loop. And it's my opinion
that tests are not the place for clever optimized code.
So, fix and simplify the test, so that it's actually testing something
and is simpler to reason about.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7900-subtree.sh defines its own `check_equal A B` function, instead of
just using `test A = B` like all of the other tests. Don't be special,
get rid of `check_equal` in favor of `test`.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's unclear what the purpose of t7900-subtree.sh's
`subtree_test_create_repo` helper function is. It wraps test-lib.sh's,
`test_create_repo` but follows that up by setting log.date=relative. Why
does it set log.date=relative?
My first guess was that at one point the tests required that, but no
longer do, and that the function is now vestigial. I even wrote a patch
to get rid of it and was moments away from `git send-email`ing it.
However, by chance when looking for something else in the history, I
discovered the true reason, from e7aac44ed2 (contrib/subtree: ignore
log.date configuration, 2015-07-21). It's testing that setting
log.date=relative doesn't break `git subtree`, as at one point in the past
that did break `git subtree`.
So, add a comment about this, to avoid future such confusion.
And while at it, go ahead and (1) touch up the function to avoid a
pointless subshell and (2) update the one test that didn't use it.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The formatting in t7900-subtree.sh isn't even consistent throughout the
file. Fix that; make it consistent throughout the file.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use test-lib.sh's `test_count`, instead instead of having
t7900-subtree.sh do its own book-keeping with `subtree_test_count` that
has to be explicitly incremented by calling `next_test`.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the tests had been converted to support
`GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`, but `contrib/subtree/t/`
hadn't.
Convert it. Most of the mentions of 'master' can just be replaced with
'HEAD'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `make -C contrib/subtree/ test` creates a `git-subtree` executable
in the root of the repo. Add it to the .gitignore so that anyone hacking
on subtree won't have to deal with that noise.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all objects unused during this invocation. Unfortunately,
this includes promisor objects.
Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place. For example, in a newly-cloned partial repo that
filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.
In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened.
For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.
Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):
Test HEAD^ HEAD
----------------------------------------------------------
5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2%
For a bigger repository, such as linux.git, the improvement is
even bigger:
Test HEAD^ HEAD
-------------------------------------------------------------------
5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1%
These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From the documentation for generating patch text with diff-related
commands, refer to the documentation for the diff attribute.
This attribute influences the way that patches are generated, but this
was previously not mentioned in e.g., the git-diff manpage.
Signed-off-by: Peter Oliver <git@mavit.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_pathspec() populates pathspec, hence we need to clear it once it's
no longer needed. seen is xcalloc'd within the same function and
likewise needs to be freed once its no longer needed.
cmd_rm() has multiple early returns, therefore we need to clear or free
as soon as this data is no longer needed, as opposed to doing a cleanup
at the end.
LSAN output from t0020:
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9ac0a4 in do_xmalloc wrapper.c:41:8
#2 0x9ac07a in xmalloc wrapper.c:62:9
#3 0x873277 in parse_pathspec pathspec.c:582:2
#4 0x646ffa in cmd_rm builtin/rm.c:266:2
#5 0x4cd91d in run_builtin git.c:467:11
#6 0x4cb5f3 in handle_builtin git.c:719:3
#7 0x4ccf47 in run_argv git.c:808:4
#8 0x4caf49 in cmd_main git.c:939:19
#9 0x69dc0e in main common-main.c:52:11
#10 0x7f948825b349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9ac2a6 in xrealloc wrapper.c:126:8
#2 0x93b14d in strbuf_grow strbuf.c:98:2
#3 0x93ccf6 in strbuf_vaddf strbuf.c:392:3
#4 0x93f726 in xstrvfmt strbuf.c:979:2
#5 0x93f8b3 in xstrfmt strbuf.c:989:8
#6 0x92ad8a in prefix_path_gently setup.c:115:15
#7 0x873a8d in init_pathspec_item pathspec.c:439:11
#8 0x87334f in parse_pathspec pathspec.c:589:3
#9 0x646ffa in cmd_rm builtin/rm.c:266:2
#10 0x4cd91d in run_builtin git.c:467:11
#11 0x4cb5f3 in handle_builtin git.c:719:3
#12 0x4ccf47 in run_argv git.c:808:4
#13 0x4caf49 in cmd_main git.c:939:19
#14 0x69dc0e in main common-main.c:52:11
#15 0x7f948825b349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 15 byte(s) in 1 object(s) allocated from:
#0 0x486834 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x9ac048 in xstrdup wrapper.c:29:14
#2 0x873ba2 in init_pathspec_item pathspec.c:468:20
#3 0x87334f in parse_pathspec pathspec.c:589:3
#4 0x646ffa in cmd_rm builtin/rm.c:266:2
#5 0x4cd91d in run_builtin git.c:467:11
#6 0x4cb5f3 in handle_builtin git.c:719:3
#7 0x4ccf47 in run_argv git.c:808:4
#8 0x4caf49 in cmd_main git.c:939:19
#9 0x69dc0e in main common-main.c:52:11
#10 0x7f948825b349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 1 byte(s) in 1 object(s) allocated from:
#0 0x49a9d2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x9ac392 in xcalloc wrapper.c:140:8
#2 0x647108 in cmd_rm builtin/rm.c:294:9
#3 0x4cd91d in run_builtin git.c:467:11
#4 0x4cb5f3 in handle_builtin git.c:719:3
#5 0x4ccf47 in run_argv git.c:808:4
#6 0x4caf49 in cmd_main git.c:939:19
#7 0x69dbfe in main common-main.c:52:11
#8 0x7f4fac1b0349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
options.git_format_patch_opt can be populated during cmd_rebase's setup,
and will therefore leak on return. Although we could just UNLEAK all of
options, we choose to strbuf_release() the individual member, which matches
the existing pattern (where we're freeing invidual members of options).
Leak found when running t0021:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9ac296 in xrealloc wrapper.c:126:8
#2 0x93b13d in strbuf_grow strbuf.c:98:2
#3 0x93bd3a in strbuf_add strbuf.c:295:2
#4 0x60ae92 in strbuf_addstr strbuf.h:304:2
#5 0x605f17 in cmd_rebase builtin/rebase.c:1759:3
#6 0x4cd91d in run_builtin git.c:467:11
#7 0x4cb5f3 in handle_builtin git.c:719:3
#8 0x4ccf47 in run_argv git.c:808:4
#9 0x4caf49 in cmd_main git.c:939:19
#10 0x69dbfe in main common-main.c:52:11
#11 0x7f66dae91349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sorting might be a list allocated in ref_default_sorting() (in this case
it's a fixed single item list, which has nevertheless been xcalloc'd),
or it might be a list allocated in parse_opt_ref_sorting(). In either
case we could free these lists - but instead we UNLEAK as we're at the
end of cmd_for_each_ref. (There's no existing implementation of
clear_ref_sorting(), and writing a loop to free the list seems more
trouble than it's worth.)
filter.with_commit/no_commit are populated via
OPT_CONTAINS/OPT_NO_CONTAINS, both of which create new entries via
parse_opt_commits(), and also need to be free'd or UNLEAK'd. Because
free_commit_list() already exists, we choose to use that over an UNLEAK.
LSAN output from t0041:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x49a9d2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x9ac252 in xcalloc wrapper.c:140:8
#2 0x8a4a55 in ref_default_sorting ref-filter.c:2486:32
#3 0x56c6b1 in cmd_for_each_ref builtin/for-each-ref.c:72:13
#4 0x4cd91d in run_builtin git.c:467:11
#5 0x4cb5f3 in handle_builtin git.c:719:3
#6 0x4ccf47 in run_argv git.c:808:4
#7 0x4caf49 in cmd_main git.c:939:19
#8 0x69dabe in main common-main.c:52:11
#9 0x7f2bdc570349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9abf54 in do_xmalloc wrapper.c:41:8
#2 0x9abf2a in xmalloc wrapper.c:62:9
#3 0x717486 in commit_list_insert commit.c:540:33
#4 0x8644cf in parse_opt_commits parse-options-cb.c:98:2
#5 0x869bb5 in get_value parse-options.c:181:11
#6 0x8677dc in parse_long_opt parse-options.c:378:10
#7 0x8659bd in parse_options_step parse-options.c:817:11
#8 0x867fcd in parse_options parse-options.c:870:10
#9 0x56c62b in cmd_for_each_ref builtin/for-each-ref.c:59:2
#10 0x4cd91d in run_builtin git.c:467:11
#11 0x4cb5f3 in handle_builtin git.c:719:3
#12 0x4ccf47 in run_argv git.c:808:4
#13 0x4caf49 in cmd_main git.c:939:19
#14 0x69dabe in main common-main.c:52:11
#15 0x7f2bdc570349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
mailinfo.p_hdr_info/s_hdr_info are null-terminated lists of strbuf's,
with entries pointing either to NULL or an allocated strbuf. Therefore
we need to free those strbuf's (and not just the data they contain)
whenever we're done with a given entry. (See handle_header() where those
new strbufs are malloc'd.)
Once we no longer need the list (and not just its entries) we can switch
over to strbuf_list_free() instead of manually iterating over the list,
which takes care of those additional details for us. We can only do this
in clear_mailinfo() - in handle_commit_message() we are only clearing the
array contents but want to reuse the array itself, hence we can't use
strbuf_list_free() there.
However, strbuf_list_free() cannot handle a NULL input, and the lists we
are freeing might be NULL. Therefore we add a NULL check in
strbuf_list_free() to make it safe to use with a NULL input (which is a
pattern used by some of the other *_free() functions around git).
Leak output from t0023:
Direct leak of 72 byte(s) in 3 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9ac9f4 in do_xmalloc wrapper.c:41:8
#2 0x9ac9ca in xmalloc wrapper.c:62:9
#3 0x7f6cf7 in handle_header mailinfo.c:205:10
#4 0x7f5abf in check_header mailinfo.c:583:4
#5 0x7f5524 in mailinfo mailinfo.c:1197:3
#6 0x4dcc95 in parse_mail builtin/am.c:1167:6
#7 0x4d9070 in am_run builtin/am.c:1732:12
#8 0x4d5b7a in cmd_am builtin/am.c:2398:3
#9 0x4cd91d in run_builtin git.c:467:11
#10 0x4cb5f3 in handle_builtin git.c:719:3
#11 0x4ccf47 in run_argv git.c:808:4
#12 0x4caf49 in cmd_main git.c:939:19
#13 0x69e43e in main common-main.c:52:11
#14 0x7fc1fadfa349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 72 byte(s) leaked in 3 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
add_pending_object() populates rev.pending, we need to take care of
clearing it once we're done.
This code is run close to the end of a checkout, therefore this leak
seems like it would have very little impact. See also LSAN output
from t0020 below:
Direct leak of 2048 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9acc46 in xrealloc wrapper.c:126:8
#2 0x83e3a3 in add_object_array_with_path object.c:337:3
#3 0x8f672a in add_pending_object_with_path revision.c:329:2
#4 0x8eaeab in add_pending_object_with_mode revision.c:336:2
#5 0x8eae9d in add_pending_object revision.c:342:2
#6 0x5154a0 in show_local_changes builtin/checkout.c:602:2
#7 0x513b00 in merge_working_tree builtin/checkout.c:979:3
#8 0x512cb3 in switch_branches builtin/checkout.c:1242:9
#9 0x50f8de in checkout_branch builtin/checkout.c:1646:9
#10 0x50ba12 in checkout_main builtin/checkout.c:2003:9
#11 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
#12 0x4cd91d in run_builtin git.c:467:11
#13 0x4cb5f3 in handle_builtin git.c:719:3
#14 0x4ccf47 in run_argv git.c:808:4
#15 0x4caf49 in cmd_main git.c:939:19
#16 0x69e43e in main common-main.c:52:11
#17 0x7f5dd1d50349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 2048 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_pathspec() allocates new memory into pathspec, therefore we need
to free it when we're done.
An UNLEAK would probably be just as good here - but clear_pathspec() is
not much more work so we might as well use it. check_ignore() is either
called once directly from cmd_check_ignore() (in which case the leak
really doesnt matter), or it can be called multiple times in a loop from
check_ignore_stdin_paths(), in which case we're potentially leaking
multiple times - but even in this scenario the leak is so small as to
have no real consequence.
Found while running t0008:
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9aca44 in do_xmalloc wrapper.c:41:8
#2 0x9aca1a in xmalloc wrapper.c:62:9
#3 0x873c17 in parse_pathspec pathspec.c:582:2
#4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
#5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
#6 0x4cd91d in run_builtin git.c:467:11
#7 0x4cb5f3 in handle_builtin git.c:719:3
#8 0x4ccf47 in run_argv git.c:808:4
#9 0x4caf49 in cmd_main git.c:939:19
#10 0x69e43e in main common-main.c:52:11
#11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9acc46 in xrealloc wrapper.c:126:8
#2 0x93baed in strbuf_grow strbuf.c:98:2
#3 0x93d696 in strbuf_vaddf strbuf.c:392:3
#4 0x9400c6 in xstrvfmt strbuf.c:979:2
#5 0x940253 in xstrfmt strbuf.c:989:8
#6 0x92b72a in prefix_path_gently setup.c:115:15
#7 0x87442d in init_pathspec_item pathspec.c:439:11
#8 0x873cef in parse_pathspec pathspec.c:589:3
#9 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
#10 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
#11 0x4cd91d in run_builtin git.c:467:11
#12 0x4cb5f3 in handle_builtin git.c:719:3
#13 0x4ccf47 in run_argv git.c:808:4
#14 0x4caf49 in cmd_main git.c:939:19
#15 0x69e43e in main common-main.c:52:11
#16 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 2 byte(s) in 1 object(s) allocated from:
#0 0x486834 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x9ac9e8 in xstrdup wrapper.c:29:14
#2 0x874542 in init_pathspec_item pathspec.c:468:20
#3 0x873cef in parse_pathspec pathspec.c:589:3
#4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
#5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
#6 0x4cd91d in run_builtin git.c:467:11
#7 0x4cb5f3 in handle_builtin git.c:719:3
#8 0x4ccf47 in run_argv git.c:808:4
#9 0x4caf49 in cmd_main git.c:939:19
#10 0x69e43e in main common-main.c:52:11
#11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 179 byte(s) leaked in 3 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prefix_filename() returns newly allocated memory, and strbuf_addstr()
doesn't take ownership of its inputs. Therefore we have to make sure to
store and free prefix_filename()'s result.
As this leak is in cmd_bugreport(), we could just as well UNLEAK the
prefix - but there's no good reason not to just free it properly. This
leak was found while running t0091, see output below:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9acc66 in xrealloc wrapper.c:126:8
#2 0x93baed in strbuf_grow strbuf.c:98:2
#3 0x93c6ea in strbuf_add strbuf.c:295:2
#4 0x69f162 in strbuf_addstr ./strbuf.h:304:2
#5 0x69f083 in prefix_filename abspath.c:277:2
#6 0x4fb275 in cmd_bugreport builtin/bugreport.c:146:9
#7 0x4cd91d in run_builtin git.c:467:11
#8 0x4cb5f3 in handle_builtin git.c:719:3
#9 0x4ccf47 in run_argv git.c:808:4
#10 0x4caf49 in cmd_main git.c:939:19
#11 0x69df9e in main common-main.c:52:11
#12 0x7f523a987349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
real_ref was previously populated by dwim_ref(), which allocates new
memory. We need to make sure to free real_ref when discarding it.
(real_ref is already being freed at the end of create_branch() - but
if we discard it early then it will leak.)
This fixes the following leak found while running t0002-t0099:
Direct leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x486954 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xdd6484 in xstrdup wrapper.c:29:14
#2 0xc0f658 in expand_ref refs.c:671:12
#3 0xc0ecf1 in repo_dwim_ref refs.c:644:22
#4 0x8b1184 in dwim_ref ./refs.h:162:9
#5 0x8b0b02 in create_branch branch.c:284:10
#6 0x550cbb in update_refs_for_switch builtin/checkout.c:1046:4
#7 0x54e275 in switch_branches builtin/checkout.c:1274:2
#8 0x548828 in checkout_branch builtin/checkout.c:1668:9
#9 0x541306 in checkout_main builtin/checkout.c:2025:9
#10 0x5395fa in cmd_checkout builtin/checkout.c:2077:8
#11 0x4d02a8 in run_builtin git.c:467:11
#12 0x4cbfe9 in handle_builtin git.c:719:3
#13 0x4cf04f in run_argv git.c:808:4
#14 0x4cb85a in cmd_main git.c:939:19
#15 0x820cf6 in main common-main.c:52:11
#16 0x7f30bd9dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fill_bloom_key() allocates memory into bloom_key, we need to clean that
up once the key is no longer needed.
This leak was found while running t0002-t0099. Although this leak is
happening in code being called from a test-helper, the same code is also
used in various locations around git, and can therefore happen during
normal usage too. Gabor's analysis shows that peak-memory usage during
'git commit-graph write' is reduced on the order of 10% for a selection
of larger repos (along with an even larger reduction if we override
modified path bloom filter limits):
https://lore.kernel.org/git/20210411072651.GF2947267@szeder.dev/
LSAN output:
Direct leak of 308 byte(s) in 11 object(s) allocated from:
#0 0x49a5e2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x6f4032 in xcalloc wrapper.c:140:8
#2 0x4f2905 in fill_bloom_key bloom.c:137:28
#3 0x4f34c1 in get_or_compute_bloom_filter bloom.c:284:4
#4 0x4cb484 in get_bloom_filter_for_commit t/helper/test-bloom.c:43:11
#5 0x4cb072 in cmd__bloom t/helper/test-bloom.c:97:3
#6 0x4ca7ef in cmd_main t/helper/test-tool.c:121:11
#7 0x4caace in main common-main.c:52:11
#8 0x7f798af95349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 308 byte(s) leaked in 11 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
common_prefix() returns a new string, which we store in max_prefix -
this string needs to be freed to avoid a leak. This leak is happening
in cmd_ls_files, hence is of no real consequence - an UNLEAK would be
just as good, but we might as well free the string properly.
Leak found while running t0002, see output below:
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x49a85d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9ab1b4 in do_xmalloc wrapper.c:41:8
#2 0x9ab248 in do_xmallocz wrapper.c:75:8
#3 0x9ab22a in xmallocz wrapper.c:83:9
#4 0x9ab2d7 in xmemdupz wrapper.c:99:16
#5 0x78d6a4 in common_prefix dir.c:191:15
#6 0x5aca48 in cmd_ls_files builtin/ls-files.c:669:16
#7 0x4cd92d in run_builtin git.c:453:11
#8 0x4cb5fa in handle_builtin git.c:704:3
#9 0x4ccf57 in run_argv git.c:771:4
#10 0x4caf49 in cmd_main git.c:902:19
#11 0x69ce2e in main common-main.c:52:11
#12 0x7f64d4d94349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev.prune_data is populated (in multiple functions) via copy_pathspec,
and therefore needs to be cleared after running the diff in those
functions.
rev(_info).pending is populated indirectly via setup_revisions, and also
needs to be cleared once diffing is done.
These leaks were found while running t0008 or t0021. The rev.prune_data
leaks are small (80B) but noisy, hence I won't bother including their
logs - the rev.pending leaks are bigger, and can happen early in the
course of other commands, and therefore possibly more valuable to fix -
see example log from a rebase below:
Direct leak of 2048 byte(s) in 1 object(s) allocated from:
#0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9ac2a6 in xrealloc wrapper.c:126:8
#2 0x83da03 in add_object_array_with_path object.c:337:3
#3 0x8f5d8a in add_pending_object_with_path revision.c:329:2
#4 0x8ea50b in add_pending_object_with_mode revision.c:336:2
#5 0x8ea4fd in add_pending_object revision.c:342:2
#6 0x8ea610 in add_head_to_pending revision.c:354:2
#7 0x9b55f5 in has_uncommitted_changes wt-status.c:2474:2
#8 0x9b58c4 in require_clean_work_tree wt-status.c:2553:6
#9 0x606bcc in cmd_rebase builtin/rebase.c:1970:6
#10 0x4cd91d in run_builtin git.c:467:11
#11 0x4cb5f3 in handle_builtin git.c:719:3
#12 0x4ccf47 in run_argv git.c:808:4
#13 0x4caf49 in cmd_main git.c:939:19
#14 0x69dc0e in main common-main.c:52:11
#15 0x7f2d18909349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x486834 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x9ac048 in xstrdup wrapper.c:29:14
#2 0x83da8d in add_object_array_with_path object.c:349:17
#3 0x8f5d8a in add_pending_object_with_path revision.c:329:2
#4 0x8ea50b in add_pending_object_with_mode revision.c:336:2
#5 0x8ea4fd in add_pending_object revision.c:342:2
#6 0x8ea610 in add_head_to_pending revision.c:354:2
#7 0x9b55f5 in has_uncommitted_changes wt-status.c:2474:2
#8 0x9b58c4 in require_clean_work_tree wt-status.c:2553:6
#9 0x606bcc in cmd_rebase builtin/rebase.c:1970:6
#10 0x4cd91d in run_builtin git.c:467:11
#11 0x4cb5f3 in handle_builtin git.c:719:3
#12 0x4ccf47 in run_argv git.c:808:4
#13 0x4caf49 in cmd_main git.c:939:19
#14 0x69dc0e in main common-main.c:52:11
#15 0x7f2d18909349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 2053 byte(s) leaked in 2 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
limit_list() iterates over the original revs->commits list, and consumes
many of its entries via pop_commit. However we might stop iterating over
the list early (e.g. if we realise that the rest of the list is
uninteresting). If we do stop iterating early, list will be pointing to
the unconsumed portion of revs->commits - and we need to free this list
to avoid a leak. (revs->commits itself will be an invalid pointer: it
will have been free'd during the first pop_commit.)
However the list pointer is later reused to iterate over our new list,
but only for the limiting_can_increase_treesame() branch. We therefore
need to introduce a new variable for that branch - and while we're here
we can rename the original list to original_list as that makes its
purpose more obvious.
This leak was found while running t0090. It's not likely to be very
impactful, but it can happen quite early during some checkout
invocations, and hence seems to be worth fixing:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9ac084 in do_xmalloc wrapper.c:41:8
#2 0x9ac05a in xmalloc wrapper.c:62:9
#3 0x7175d6 in commit_list_insert commit.c:540:33
#4 0x71800f in commit_list_insert_by_date commit.c:604:9
#5 0x8f8d2e in process_parents revision.c:1128:5
#6 0x8f2f2c in limit_list revision.c:1418:7
#7 0x8f210e in prepare_revision_walk revision.c:3577:7
#8 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6
#9 0x512f05 in switch_branches builtin/checkout.c:1250:3
#10 0x50f8de in checkout_branch builtin/checkout.c:1646:9
#11 0x50ba12 in checkout_main builtin/checkout.c:2003:9
#12 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
#13 0x4cd91d in run_builtin git.c:467:11
#14 0x4cb5f3 in handle_builtin git.c:719:3
#15 0x4ccf47 in run_argv git.c:808:4
#16 0x4caf49 in cmd_main git.c:939:19
#17 0x69dc0e in main common-main.c:52:11
#18 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Indirect leak of 48 byte(s) in 3 object(s) allocated from:
#0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9ac084 in do_xmalloc wrapper.c:41:8
#2 0x9ac05a in xmalloc wrapper.c:62:9
#3 0x717de6 in commit_list_append commit.c:1609:35
#4 0x8f1f9b in prepare_revision_walk revision.c:3554:12
#5 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6
#6 0x512f05 in switch_branches builtin/checkout.c:1250:3
#7 0x50f8de in checkout_branch builtin/checkout.c:1646:9
#8 0x50ba12 in checkout_main builtin/checkout.c:2003:9
#9 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
#10 0x4cd91d in run_builtin git.c:467:11
#11 0x4cb5f3 in handle_builtin git.c:719:3
#12 0x4ccf47 in run_argv git.c:808:4
#13 0x4caf49 in cmd_main git.c:939:19
#14 0x69dc0e in main common-main.c:52:11
#15 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all code paths correctly set the hash algorithm member of
struct object_id, write an object's hex representation using the hash
algorithm member embedded in it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are numerous places in the codebase where we assume we can
initialize data by zeroing all its bytes. However, when we do that with
a struct object_id, it leaves the structure with a zero value for the
algorithm, which is invalid.
We could forbid this pattern and require that all struct object_id
instances be initialized using oidclr, but this seems burdensome and
it's unnatural to most C programmers. Instead, if the algorithm is
zero, assume we wanted to use the default hash algorithm instead.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use struct object_id for the names of objects. It isn't intended to
be used for other hash values that don't name objects such as the pack
hash.
Because struct object_id will soon need to have its algorithm member
set, using it in this code path would mean that we didn't set that
member, only the hash member, which would result in a crash. For both
of these reasons, switch to using an unsigned char array of size
GIT_MAX_RAWSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea behind struct object_id is that it is supposed to represent the
identifier of a standard Git object or a special pseudo-object like the
all-zeros object ID. In this case, we have file hashes, which, while
similar, are distinct from the identifiers of objects.
Switch these code paths to use an unsigned char array. This is both
more logically consistent and it means that we need not set the
algorithm identifier for the struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In most cases, when we load the hash of an object into a struct
object_id, we load it using one of the oid* or *_oid_hex functions.
However, for git show-index, we read it in directly using fread. As a
consequence, set the algorithm correctly so the objects can be used
correctly both now and in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Up until recently, object IDs did not have an algorithm member, only a
hash. Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms. Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.
Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that struct object_id has an algorithm field, we should populate it.
This will allow us to handle object IDs in any supported algorithm and
distinguish between them. Ensure that the field is written whenever we
write an object ID by storing it explicitly every time we write an
object. Set values for the empty blob and tree values as well.
In addition, use the algorithm field to compare object IDs. Note that
because we zero-initialize struct object_id in many places throughout
the codebase, we default to the default algorithm in cases where the
algorithm field is zero rather than explicitly initialize all of those
locations.
This leads to a branch on every comparison, but the alternative is to
compare the entire buffer each time and padding the buffer for SHA-1.
That alternative ranges up to 3.9% worse than this approach on the perf
t0001, t1450, and t1451.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we need our instances of struct object_id to be zero padded, we
can no longer cast unsigned char buffers to be pointers to struct
object_id. This file reads data out of the pack objects and then
inserts it directly into a linked list item which is a pointer to struct
object_id. Instead, let's have the linked list item hold its own struct
object_id and copy the data into it.
In addition, since these are not really pointers to struct object_id,
stop passing them around as such, and call them what they really are:
pointers to unsigned char.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary. To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid the penalty of having to branch in hash comparison functions,
we'll want to always compare the full hash member in a struct object_id,
which will require that SHA-1 object IDs be zero-padded. To do so, add
a function which finalizes a hash context and writes it into an object
ID that performs this padding.
Move the definition of struct object_id and the constant definitions
higher up so we they are available for us to use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In most places in the codebase, we use oidread to properly read an
object ID into a struct object_id. However, in the HTTP code, we end up
needing to parse a loose object path with a slash in it, so we can't do
that. Let's instead explicitly set the algorithm in this function so we
can rely on it in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we'll want oidread to automatically set the hash
algorithm member for an object ID we read into it, so ensure we use
oidread instead of hashcpy everywhere we're copying a hash value into a
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.
Performance testing on object ID-heavy workloads doesn't reveal a clear
change in performance. Out of performance tests t0001 and t1450, there
are slight variations in performance both up and down, but all
measurements are within the margin of error.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the placeholders %ah and %ch to format author date and committer
date, like --date=human does, which provides more humanity date output.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the description for the --date/format equivalency tests added
in 466fb6742d (pretty: provide a strict ISO 8601 date format,
2014-08-29) and 0df621172d (pretty: provide short date format,
2019-11-19) to be more meaningful.
This allows us to reword the comment added in the former commit to
refer to both tests, and any other future test, such as the in-flight
--date=human format being proposed in [1].
1. http://lore.kernel.org/git/pull.939.v2.git.1619275340051.gitgitgadget@gmail.com
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a needlessly complex test for the %aI/%cI date
formats (iso-strict) added in 466fb6742d (pretty: provide a strict
ISO 8601 date format, 2014-08-29) to instead use the same pattern used
to test %as/%cs since 0df621172d (pretty: provide short date format,
2019-11-19).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The $subcommand case statement in _git_stash() is quite repetitive.
Consolidate the cases together into one catch-all case to reduce the
repetition.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the introduction of the $__git_cmd_idx variable in e94fb44042
(git-completion.bash: pass $__git_subcommand_idx from __git_main(),
2021-03-24), completion functions were able to know the index at which
the git command is listed, allowing them to skip options that are given
to the underlying git itself, not the corresponding command (e.g.
`-C asdf` in `git -C asdf branch`).
While most of the changes here are self-explanatory, some bear further
explanation.
For the __git_find_on_cmdline() and __git_find_last_on_cmdline() pair of
functions, these functions are only ever called in the context of a git
command completion function. These functions will only care about words
after the command so we can safely ignore the words before this.
For _git_worktree(), this change is technically a no-op (once the
__git_find_last_on_cmdline change is also applied). It was in poor style
to have hard-coded on the index right after `worktree`. In case
`git worktree` were to ever learn to accept options, the current
situation would be inflexible.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In e94fb44042 (git-completion.bash: pass $__git_subcommand_idx from
__git_main(), 2021-03-24), the $__git_subcommand_idx variable was
introduced. Naming it after the index of the subcommand is needlessly
confusing as, when this variable is used, it is in the completion
functions for commands (e.g. _git_remote()) where for `git remote add`,
the `remote` is referred to as the command and `add` is referred to as
the subcommand.
Rename this variable so that it's obvious it's about git commands. While
we're at it, shorten up its name so that it's still readable without
being a handful to type.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to test whether the new GIT_CONFIG_SYSTEM environment variable
behaves as expected, we unset GIT_CONFIG_NOSYSTEM in one of our tests in
t1300. But because tests are not executed in a subshell, this unset
leaks into all subsequent tests and may thus cause them to fail in some
environments. These failures are easily reproducable with `make
prefix=/root test`.
Fix the issue by not using `sane_unset GIT_CONFIG_NOSYSTEM`, but instead
just manually add it to the environment of the two command invocations
which need it.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes it is useful to get information of an object without having to
download it completely.
Add the "object-info" capability that lets the client ask for
object-related information with their full hexadecimal object names.
Only sizes are returned for now.
Signed-off-by: Bruno Albuquerque <bga@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dependencies for config-list.h and command-list.h were broken
when the former was split out of the latter, which has been
corrected.
* sg/bugreport-fixes:
Makefile: add missing dependencies of 'config-list.h'
Documentation updates, with unrelated comment updates, too.
* ab/usage-error-docs:
api docs: document that BUG() emits a trace2 error event
api docs: document BUG() in api-error-handling.txt
usage.c: don't copy/paste the same comment three times
When "git pack-objects" makes a literal copy of a part of existing
packfile using the reachability bitmaps, its update to the progress
meter was broken.
* jk/pack-objects-bitmap-progress-fix:
pack-objects: update "nr_seen" progress based on pack-reused count
A bit of code clean-up and a lot of test clean-up around userdiff
area.
* ab/userdiff-tests:
blame tests: simplify userdiff driver test
blame tests: don't rely on t/t4018/ directory
userdiff: remove support for "broken" tests
userdiff tests: list builtin drivers via test-tool
userdiff tests: explicitly test "default" pattern
userdiff: add and use for_each_userdiff_driver()
userdiff style: normalize pascal regex declaration
userdiff style: declare patterns with consistent style
userdiff style: re-order drivers in alphabetical order
In e94fb44042 (git-completion.bash: pass $__git_subcommand_idx from
__git_main(), 2021-03-24), a line was introduced which contained
multiple statements. This is difficult to read so break it into multiple
lines.
While we're at it, follow this convention for the rest of the
__git_main() and break up lines that contain multiple statements.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
AFAICT parsing the output of `git diff --name-only master...feature`
is the intended way of programmatically getting the list of files
modified
by a feature branch. It is impossible to parse text unless you know what
encoding it is in. The output encoding of diff --name-only and
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use `git for-each-ref`, every ref will allocate
its own output strbuf and error strbuf. But we can reuse
the final strbuf for each step ref's output. The error
buffer will also be reused, despite the fact that the git
will exit when `format_ref_array_item()` return a non-zero
value and output the contents of the error buffer.
The performance for `git for-each-ref` on the Git repository
itself with performance testing tool `hyperfine` changes from
23.7 ms ± 0.9 ms to 22.2 ms ± 1.0 ms. Optimization is relatively
minor.
At the same time, we apply this optimization to `git tag -l`
and `git branch -l`.
This approach is similar to the one used by 79ed0a5
(cat-file: use a single strbuf for all output, 2018-08-14)
to speed up the cat-file builtin.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inlining the exported function `show_ref_array_item()`,
which is not providing the right level of abstraction,
simplifies the API and can unlock improvements at the
former call sites.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to have git run in a fully controlled environment without any
misconfiguration, it may be desirable for users or scripts to override
global- and system-level configuration files. We already have a way of
doing this, which is to unset both HOME and XDG_CONFIG_HOME environment
variables and to set `GIT_CONFIG_NOGLOBAL=true`. This is quite kludgy,
and unsetting the first two variables likely has an impact on other
executables spawned by such a script.
The obvious way to fix this would be to introduce `GIT_CONFIG_NOGLOBAL`
as an equivalent to `GIT_CONFIG_NOSYSTEM`. But in the past, it has
turned out that this design is inflexible: we cannot test system-level
parsing of the git configuration in our test harness because there is no
way to change its location, so all tests run with `GIT_CONFIG_NOSYSTEM`
set.
Instead of doing the same mistake with `GIT_CONFIG_NOGLOBAL`, introduce
two new variables `GIT_CONFIG_GLOBAL` and `GIT_CONFIG_SYSTEM`:
- If unset, git continues to use the usual locations.
- If set to a specific path, we skip reading the normal
configuration files and instead take the path. By setting the path
to `/dev/null`, no configuration will be loaded for the respective
level.
This implements the usecase where we want to execute code in a sanitized
environment without any potential misconfigurations via `/dev/null`, but
is more flexible and allows for more usecases than simply adding
`GIT_CONFIG_NOGLOBAL`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's two callsites which assemble global config paths, once in the
config loading code and once in the git-config(1) builtin. We're about
to implement a way to override global config paths via an environment
variable which would require us to adjust both sites.
Unify both code paths into a single `git_global_config()` function which
returns both paths for `~/.gitconfig` and the XDG config file. This will
make the subsequent patch which introduces the new envvar easier to
implement.
No functional changes are expected from this patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git_etc_gitconfig()` function retrieves the system-level path of
the configuration file. We're about to introduce a way to override it
via an environment variable, at which point the name of this function
would start to become misleading.
Rename the function to `git_system_config()` as a preparatory step.
While at it, the function is also refactored to pass memory ownership to
the caller. This is done to better match semantics of
`git_global_config()`, which is going to be introduced in the next
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.
Improve this by introducing a new `--filter-provided-objects` option
to the git-rev-parse(1) command. If given, then all user-provided
references will be subject to filtering.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user has multiple objects filters specified, then this is
internally represented by having a "combined" filter. These combined
filters aren't yet supported by bitmap indices and can thus not be
accelerated.
Fix this by implementing support for these combined filters. The
implementation is quite trivial: when there's a combined filter, we
simply recurse into `filter_bitmap()` for all of the sub-filters.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preceding commit has added a new object filter for git-rev-list(1)
which allows to filter objects by type. Implement the equivalent filter
for packfile bitmaps so that we can answer these queries fast.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.
Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:
$ git rev-list --objects --filter=object:type=blob \
--filter=blob:limit=200
Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.
The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.
To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.
Local SSD:
Sequential 10 workers Speedup
Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03
Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03
Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03
Local HDD:
Sequential 10 workers Speedup
Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03
Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18
Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07
Linux NFS server (v4.1, on EBS, single availability zone):
Sequential 32 workers Speedup
Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13
Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07
Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Sequential 32 workers Speedup
Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07
Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03
Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.
To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:
sequential fallback 10 workers speedup
10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02
20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02
50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02
100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04
200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05
500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09
1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12
2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12
From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.
Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.
About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.
The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).
For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document changes in -m and --diff-merges=m semantics, as well as new
--diff-merges=on option.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New log.diffMerges configuration variable sets the format that
--diff-merges=on will be using. The default is "separate".
t4013: add the following tests for log.diffMerges config:
* Test that wrong values are denied.
* Test that the value of log.diffMerges properly affects both
--diff-merges=on and -m.
t9902: fix completion tests for log.d* to match log.diffMerges.
Added documentation for log.diffMerges.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let -m option (and --diff-merges=m) enable the default format instead
of "separate", to be able to tune it with log.diffMerges option.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split set_diff_merges() into separate parsing and execution functions,
the former to be reused for parsing of configuration values later in
the patch series.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the notion of default diff format for merges, and the option
"on" to select it. The default format is "separate" and can't yet
be changed, so effectively "on" is just a synonym for "separate"
for now. Add corresponding test to t4013.
This is in preparation for introducing log.diffMerges configuration
option that will let --diff-merges=on to be configured to any
supported format.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug the ort merge backend throughout the rest of the system, and
start testing it as a replacement for the recursive backend.
* en/ort-readiness:
Add testing with merge-ort merge strategy
t6423: mark remaining expected failure under merge-ort as such
Revert "merge-ort: ignore the directory rename split conflict for now"
merge-recursive: add a bunch of FIXME comments documenting known bugs
merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflict
t: mark several submodule merging tests as fixed under merge-ort
merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries
t6428: new test for SKIP_WORKTREE handling and conflicts
merge-ort: support subtree shifting
merge-ort: let renormalization change modify/delete into clean delete
merge-ort: have ll_merge() use a special attr_index for renormalization
merge-ort: add a special minimal index just for renormalization
merge-ort: use STABLE_QSORT instead of QSORT where required
Various rename detection optimization to help "ort" merge strategy
backend.
* en/ort-perf-batch-10:
diffcore-rename: determine which relevant_sources are no longer relevant
merge-ort: record the reason that we want a rename for a file
diffcore-rename: add computation of number of unknown renames
diffcore-rename: check if we have enough renames for directories early on
diffcore-rename: only compute dir_rename_count for relevant directories
merge-ort: record the reason that we want a rename for a directory
merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type
diffcore-rename: take advantage of "majority rules" to skip more renames
Aliased command lookup accesses the `list` variable before it has been
set, causing an error in "nounset" mode. Initialize to an empty string
to avoid that.
$ git nonexistent-command <Tab>bash: list: unbound variable
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a remote has the skipFetchAll setting enabled, then that remote is
not intended for frequent fetching. It makes sense to not fetch that
data during the 'prefetch' maintenance task. Skip that remote in the
iteration without error. The skip_default_update member is initialized
in remote.c:handle_config() as part of initializing the 'struct remote'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'prefetch' maintenance task previously forced the following refspec
for each remote:
+refs/heads/*:refs/prefetch/<remote>/*
If a user has specified a more strict refspec for the remote, then this
prefetch task downloads more objects than necessary.
The previous change introduced the '--prefetch' option to 'git fetch'
which manipulates the remote's refspec to place all resulting refs into
refs/prefetch/, with further partitioning based on the destinations of
those refspecs.
Update the documentation to be more generic about the destination refs.
Do not mention custom refspecs explicitly, as that does not need to be
highlighted in this documentation. The important part of placing refs in
refs/prefetch/ remains.
Reported-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --prefetch option will be used by the 'prefetch' maintenance task
instead of sending refspecs explicitly across the command-line. The
intention is to modify the refspec to place all results in
refs/prefetch/ instead of anywhere else.
Create helper method filter_prefetch_refspec() to modify a given refspec
to fit the rules expected of the prefetch task:
* Negative refspecs are preserved.
* Refspecs without a destination are removed.
* Refspecs whose source starts with "refs/tags/" are removed.
* Other refspecs are placed within "refs/prefetch/".
Finally, we add the 'force' option to ensure that prefetch refs are
replaced as necessary.
There are some interesting cases that are worth testing.
An earlier version of this change dropped the "i--" from the loop that
deletes a refspec item and shifts the remaining entries down. This
allowed some refspecs to not be modified. The subtle part about the
first --prefetch test is that the "refs/tags/*" refspec appears directly
before the "refs/heads/bogus/*" refspec. Without that "i--", this
ordering would remove the "refs/tags/*" refspec and leave the last one
unmodified, placing the result in "refs/heads/*".
It is possible to have an empty refspec. This is typically the case for
remotes other than the origin, where users want to fetch a specific tag
or branch. To correctly test this case, we need to further remove the
upstream remote for the local branch. Thus, we are testing a refspec
that will be deleted, leaving nothing to fetch.
Helped-by: Tom Saeger <tom.saeger@oracle.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apparently this is not supported with Microsoft's Universal C Runtime.
So let's not actually do that.
Instead, just return success because we _know_ that we expect the `NUL`
device to be present.
Side note: it is possible to turn off the "Null device driver" and
thereby disable `NUL`. Too many things are broken if this driver is
disabled, therefore it is not worth bothering to try to detect its
presence when `access()` is called.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On write() errors, packet_write() dies with the same error message that
is already printed by its callee, packet_write_gently(). This produces
an unnecessarily verbose and repetitive output:
error: packet write failed
fatal: packet write failed: <strerror() message>
In addition to that, packet_write_gently() does not always fulfill its
caller expectation that errno will be properly set before a non-zero
return. In particular, that is not the case for a "data exceeds max
packet size" error. So, in this case, packet_write() will call
die_errno() and print an strerror(errno) message that might be totally
unrelated to the actual error.
Fix both those issues by turning packet_write() and
packet_write_gently() into wrappers to a common lower level function
that doesn't print the error message, but instead returns it on a buffer
for the caller to die() or error() as appropriate.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" now takes "--3way" and "--cached" at the same time, and
work and record results only in the index.
* jz/apply-3way-cached:
git-apply: allow simultaneous --cached and --3way options
The command line completion (in contrib/) has learned that
CHERRY_PICK_HEAD is a possible pseudo-ref.
* ab/complete-cherry-pick-head:
bash completion: complete CHERRY_PICK_HEAD
"git apply --3way" has always been "to fall back to 3-way merge
only when straight application fails". Swap the order of falling
back so that 3-way is always attempted first (only when the option
is given, of course) and then straight patch application is used as
a fallback when it fails.
* jz/apply-run-3way-first:
git-apply: try threeway first when "--3way" is used
A command such as `git push -qu origin feature` will print "Branch
'feature' set up to track remote branch 'feature' from 'origin'." even
when --quiet is passed. In this case it's because install_branch_config() is
always called with BRANCH_CONFIG_VERBOSE.
struct transport keeps track of the desired verbosity. Fix the above
issue by passing BRANCH_CONFIG_VERBOSE conditionally based on that.
Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The same "do not capitalize the first word" rule is applied to both
our patch titles and error messages, but the existing description
was fuzzy in two aspects.
* For error messages, it was not said that this was only about the
first word that begins the sentence.
* For both, it was not clear when a capital letter there was not an
error. We avoid capitalizing the first word when the only reason
you would capitalize it is because it happens to be the first
word in the sentence. If a proper noun, which is usually spelled
in capital letters, happens to come at the beginning of the
sentence, it should be kept in capital letters.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.
When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.
Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.
The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.
There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.
To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sparse directory entries represent a directory that is outside the
sparse-checkout definition. These are not paths to blobs, so should not
be added to the name_hash table. Instead, they should be added to the
directory hashtable when 'ignore_case' is true.
Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.
In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.
Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().
Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.
The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:
Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* merge-ort.c:record_conflicted_index_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()
Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()
Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods
Checking for correct order of cache entries:
* read-cache.c:check_ce_order()
Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command-line completion script (in contrib/) had a couple of
references that would have given a warning under the "-u" (nounset)
option.
* vs/completion-with-set-u:
completion: audit and guard $GIT_* against unset use
"gitweb" learned "e-mail privacy" feature to redact strings that
look like e-mail addresses on various pages.
* gk/gitweb-redacted-email:
gitweb: add "e-mail privacy" feature to redact e-mail addresses
Clean-up codepaths that implements "git send-email --validate"
option and improves the message from it.
* ab/send-email-validate-errors:
git-send-email: improve --validate error output
git-send-email: refactor duplicate $? checks into a function
git-send-email: test full --validate output
Streamline the codepath to fix the UTF-8 encoding issues in the
argv[] and the prefix on macOS.
* tb/precompose-prefix-simplify:
macOS: precompose startup_info->prefix
precompose_utf8: make precompose_string_if_needed() public
A configuration variable has been added to force tips of certain
refs to be given a reachability bitmap.
* tb/pack-preferred-tips-to-give-bitmap:
builtin/pack-objects.c: respect 'pack.preferBitmapTips'
t/helper/test-bitmap.c: initial commit
pack-bitmap: add 'test_bitmap_commits()' helper
A NULL-dereference bug has been corrected in an error codepath in
"git for-each-ref", "git branch --list" etc.
* jk/ref-filter-segfault-fix:
ref-filter: fix NULL check for parse object failure
Correct documentation added in e544221d97 (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed48 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.
The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04 (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.
Let's also add a cross-reference from api-error-handling.txt.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the BUG() function was added in d8193743e0 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ee4512ed48 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd129 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finish the removal I started in 1108cea7f8 (tests: remove most uses
of test_i18ncmp, 2021-02-11). At that time the function wasn't removed
due to disruption with in-flight changes, remove the occurrences that
have landed since then.
As of writing this there are no test_i18ncmp uses between "master" and
"seen", so let's also remove the function to finally put it to rest.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --exclude-promisor-objects is given, before traversing any objects
we iterate over all of the objects in any promisor packs, marking them
as UNINTERESTING and SEEN. We turn the oid we get from iterating the
pack into an object with parse_object(), but this has two problems:
- it's slow; we are zlib inflating (and reconstructing from deltas)
every byte of every object in the packfile
- it leaves the tree buffers attached to their structs, which means
our heap usage will grow to store every uncompressed tree
simultaneously. This can be gigabytes.
We can obviously fix the second by freeing the tree buffers after we've
parsed them. But we can observe that the function doesn't look at the
object contents at all! The only reason we call parse_object() is that
we need a "struct object" on which to set the flags. There are two
options here:
- we can look up just the object type via oid_object_info(), and then
call the appropriate lookup_foo() function
- we can call lookup_unknown_object(), which gives us an OBJ_NONE
struct (which will get auto-converted later by object_as_type() via
calls to lookup_commit(), etc).
The first one is closer to the current code, but we do pay the price to
look up the type for each object. The latter should be more efficient in
CPU, though it wastes a little bit of memory (the "unknown" object
structs are a union of all object types, so some of the structs are
bigger than they need to be). It also runs the risk of triggering a
latent bug in code that calls lookup_object() directly but isn't ready
to handle OBJ_NONE (such code would already be buggy, but we use
lookup_unknown_object() infrequently enough that it might be hiding).
I went with the second option here. I don't think the risk is high (and
we'd want to find and fix any such bugs anyway), and it should be more
efficient overall.
The new tests in p5600 show off the improvement (this is on git.git):
Test HEAD^ HEAD
-------------------------------------------------------------------------------
5600.5: count commits 0.37(0.37+0.00) 0.38(0.38+0.00) +2.7%
5600.6: count non-promisor commits 11.74(11.37+0.37) 0.04(0.03+0.00) -99.7%
The improvement is particularly big in this script because _every_
object in the newly-cloned partial repo is a promisor object. So after
marking them all, there's nothing left to traverse.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the other lookup_foo() functions take a repository argument, but
lookup_unknown_object() was never converted, and it uses the_repository
internally. Let's fix that.
We could leave a wrapper that uses the_repository, but there aren't that
many calls, so we'll just convert them all. I looked briefly at each
site to see if we had a repository struct (besides the_repository) we
could pass, but none of them do (so this conversion to pass
the_repository is a pure noop in each case, though it does take us one
step closer to eventually getting rid of the_repository).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To get the list of all promisor objects, we not only include all objects
in promisor packs, but also parse each of those objects to see which
objects they reference. After parsing a tree object, the tree->buffer
field will remain populated until we explicitly free it. So in a partial
clone of blob:none, for example, we are essentially reading every tree
in the repository (since they're all in the initial promisor pack), and
keeping all of their uncompressed contents in memory at once.
This patch frees the tree buffers after we've finished marking all of
their reachable objects. We shouldn't need to do this for any other
object type. While we are using some extra memory to store the structs,
no other object type stores the whole contents in its parsed form (we do
sometimes hold on to commit buffers, but less so these days due to
commit graphs, plus most commands which care about promisor objects turn
off the save_commit_buffer global).
Even for a moderate-sized repository like git.git, this patch drops the
peak heap (as measured by massif) for git-fsck from ~1.7GB to ~138MB.
Fsck is a good candidate for measuring here because it doesn't interact
with the promisor code except to call is_promisor_object(), so we can
isolate just this problem.
The added perf test shows only a tiny improvement on my machine for
git.git, since 1.7GB isn't enough to cause any real memory pressure:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5600.4: fsck 21.26(20.90+0.35) 20.84(20.79+0.04) -2.0%
With linux.git the absolute change is a bit bigger, though still a small
percentage:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.4: fsck 262.26(259.13+3.12) 254.92(254.62+0.29) -2.8%
I didn't have the patience to run it under massif with linux.git, but
it's probably on the order of about 14GB improvement, since that's the
sum of the sizes of all of the uncompressed trees (but still isn't
enough to create memory pressure on this particular machine, which has
64GB of RAM). Smaller machines would probably see a bigger effect on
runtime (and sadly our perf suite does not measure peak heap).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref backend API uses errno as a sideband error channel.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new method uses the update_index counter, which isn't susceptible to clock
inaccuracies.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor a test added in 83c9433e67 (git-svn: support for git-svn
propset, 2014-12-07) to avoid using "set -e" in the test body. Let's
move this into a setup test using "test_expect_success" instead.
While I'm at it refactor:
* Repeated "mkdir" to "mkdir -p"
* Uses of "touch" to creating the files with ">" instead
* The "rm -rf" at the end to happen in a "test_when_finished"
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the immediate "rm -rf .git" from the start of this test. This
was added back in 41337e22f0 (git-svn: add tests for command-line
usage of init and clone commands, 2007-11-17) when there was a "trash"
directory shared by all the tests, but ever since abc5d372ec (Enable
parallel tests, 2008-08-08) we've had per-test trash directories.
So this setup can simply be removed. We could use
TEST_NO_CREATE_REPO=true, but I don't think it's worth the effort to
go out of our way to be different. It doesn't matter that we now have
a redundant .git at the top-level.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When serving a clone or fetch with bitmaps, after deciding which objects
need to be sent our "pack reuse" mechanism kicks in: we try to send
more-or-less verbatim a bunch of objects from the beginning of the
bitmapped packfile without even adding them to the to_pack.objects
array.
After deciding which objects will be in the "reused" portion, we update
nr_result to account for those, and then trigger display_progress() to
show the user (who is undoubtedly dazzled that we managed to enumerate
so many objects so quickly).
But then something confusing happens: the "Enumerating objects" progress
meter jumps _backwards_, counting up from zero the number of objects we
actually add into to_pack.objects.
This worked correctly once upon a time, but was broken in 5af050437a
(pack-objects: show some progress when counting kept objects,
2018-04-15), when the latter half of that progress meter switched to
using a separate nr_seen counter, rather than nr_result. Nobody noticed
for two reasons:
- prior to the pack-reuse fixes from a14aebeac3 (Merge branch
'jk/packfile-reuse-cleanup', 2020-02-14), the reuse code almost
never kicked in anyway
- the output looks _kind of_ correct. The "backwards" moment is hard
to catch, because we overwrite the old progress number with the new
one, and the larger number is displayed only for a second. So unless
you look at that exact second, you just see the much smaller value,
counting up to the number of non-reused objects (though of course if
you catch it in stderr, or look at GIT_TRACE_PACKET from a server
with bitmaps, you can see both values).
This smaller output isn't wrong per se, but isn't counting what we ever
intended to. We should give the user the whole number of objects we
considered (which, as per 5af050437a's original purpose, is already
_not_ a count of what goes into to_pack.objects). The follow-on
"Counting objects" meter shows the actual number of objects we feed into
that array.
We can easily fix this by bumping (and showing) nr_seen for the
pack-reused objects. When the included test is run without this patch,
the second pack-objects invocation produces "Enumerating objects: 1" to
show the one loose object, even though the resulting pack has hundreds
of objects in it. With it, we jump to "Enumerating objects: 674" after
deciding on reuse, and then "675" when we add in the loose object.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
versions could be an empty string_list. In that case, versions->items is
NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
that results in undefined behaviour).
Moreover we only use the results of this calculation once when calling
QSORT. Therefore we choose to skip creating relevant_entries and call
QSORT directly with our manipulated pointers (but only if there's data
requiring sorting). This lets us avoid abusing the string_list API,
and saves us from having to explain why this abuse is OK.
Finally, an assertion is added to make sure that write_tree() is called
with a valid offset.
This issue has probably existed since:
ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
But it only started occurring during tests since tests started using
merge-ort:
f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)
For reference - here's the original UBSAN commit that implemented this
check, it sounds like this behaviour isn't actually likely to cause any
issues (but we might as well fix it regardless):
https://reviews.llvm.org/D67122
UBSAN output from t3404 or t5601:
merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
#0 0x78bb53 in write_tree merge-ort.c:2669:43
#1 0x7856c9 in process_entries merge-ort.c:3303:2
#2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
#3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
#4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
#5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
#6 0x6ef055 in single_pick sequencer.c:4814:9
#7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
#8 0x4fb392 in run_sequencer revert.c:225:9
#9 0x4fa5b0 in cmd_revert revert.c:235:8
#10 0x42abd7 in run_builtin git.c:453:11
#11 0x429531 in handle_builtin git.c:704:3
#12 0x4282fb in run_argv git.c:771:4
#13 0x4282fb in cmd_main git.c:902:19
#14 0x524b63 in main common-main.c:52:11
#15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#16 0x4072b9 in _start start.S:120
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.
No change in behaviour is expected from this commit given that there are
no filters yet for those object types.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Re-order the sections of a few manual pages to be consistent with the
entirety of the rest of our documentation. This allows us to remove
the just-added whitelist of "bad" order from
lint-man-section-order.perl.
I'm doing that this way around so that code will be easy to dig up if
we'll need it in the future. I've intentionally not added some other
sections such as EXAMPLES to the list of known sections.
If we were to add that we'd find some out of order. Perhaps we'll want
to order those consistently as well in the future, at which point
whitelisting some of them might become handy again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a linting script to check the relative order of the sections in
the documentation. We should have NAME, then SYNOPSIS, DESCRIPTION,
OPTIONS etc. in that order.
That holds true throughout our documentation, except for a few
exceptions which are hardcoded in the linting script.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lint for and fix the three manual pages that were missing the standard
"Part of the linkgit:git[1] suite" end section.
We only do this for the man[157] section documents (we don't have
anything outside those sections), not files to be included,
howto *.txt files etc.
We could also add this to the existing (and then renamed)
lint-gitlink.perl, but I'm not doing that here.
Obviously all of that fits in one script, but I think for something
like this that's a one-off script with global variables it's much
harder to follow when a large part of your script is some if/else or
keeping/resetting of state simply to work around the script doing two
things instead of one.
Especially because in this case this script wants to process the file
as one big string, but lint-gitlink.perl wants to look at it one line
at a time. We could also consolidate this whole thing and
t/check-non-portable-shell.pl, but that one likes to join lines as
part of its shell parsing.
So let's just add another script, whole scaffolding is basically:
use strict;
use warnings;
sub report { ... }
my $code = 0;
while (<>) { ... }
exit $code;
We'd spend more lines effort trying to consolidate them than just
copying that around.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lint-gitlink.perl script added in ab81411ced (ci: validate
"linkgit:" in documentation, 2016-05-04) was more complex than it
needed to be. It:
- Was using File::Find to recursively find *.txt files in
Documentation/, let's instead use the Makefile as a source of truth
for *.txt files, and pass it down to the script.
- We now don't lint linkgit:* in RelNotes/* or technical/*, which we
shouldn't have been doing in the first place anyway.
- When the doc-diff script was added in beb188e22a (add a script to
diff rendered documentation, 2018-08-06) we started sometimes having
a "git worktree" under Documentation/.
This tree contains a full checkout of git.git, as a result the
"lint" script would recurse into that, and lint any *.txt file
found in that entire repository.
In practice the only in-tree "linkgit" outside of the
Documentation/ tree is contrib/contacts/git-contacts.txt and
contrib/subtree/git-subtree.txt, so this wouldn't emit any errors
Now we instead simply trust the Makefile to give us *.txt files.
Since the Makefile also knows what sections each page should be in we
don't have to open the files ourselves and try to parse that out. As a
bonus this will also catch bugs with the section line in the files
themselves being incorrect.
The structure of the new script is mostly based on
t/check-non-portable-shell.pl. As an added bonus it will also use
pos() to print where the problems it finds are, e.g. given an issue
like:
diff --git a/Documentation/git-cherry.txt b/Documentation/git-cherry.txt
[...]
and line numbers. git-cherry therefore detects when commits have been
-"copied" by means of linkgit:git-cherry-pick[1], linkgit:git-am[1] or
-linkgit:git-rebase[1].
+"copied" by means of linkgit:git-cherry-pick[2], linkgit:git-am[3] or
+linkgit:git-rebase[4].
We'll now emit:
git-cherry.txt:20: error: git-cherry-pick[2]: wrong section (should be 1), shown with 'HERE' below:
git-cherry.txt:20: '"copied" by means of linkgit:git-cherry-pick[2]' <-- HERE
git-cherry.txt:20: error: git-am[3]: wrong section (should be 1), shown with 'HERE' below:
git-cherry.txt:20: '"copied" by means of linkgit:git-cherry-pick[2], linkgit:git-am[3]' <-- HERE
git-cherry.txt:21: error: git-rebase[4]: wrong section (should be 1), shown with 'HERE' below:
git-cherry.txt:21: 'linkgit:git-rebase[4]' <-- HERE
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend this script added in ab81411ced (ci: validate "linkgit:" in
documentation, 2016-05-04) to pass under "use strict", and add a "use
warnings" for good measure.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Re-introduce a variable to declare what *.txt files need to be
considered for the purposes of scouring files to generate a dependency
graph of includes.
When doc.dep was introduced in a5ae8e64cf (Fix documentation
dependency generation., 2005-11-07) we had such a variable called
TEXTFILES, but it was refactored away just a few commits after that in
fb612d54c1 (Documentation: fix dependency generation.,
2005-11-07). I'm planning to add more wildcards here, so let's bring
it back.
I'm not calling it TEXTFILES because we e.g. don't consider
Documentation/technical/*.txt when generating the graph (they don't
use includes). Let's instead call it DOC_DEP_TXT.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor occurrences of $(wildcard howto/*.txt) into a single
HOWTO_TXT variable for readability and consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug in how --no-reschedule-failed-exec interacts with
rebase.rescheduleFailedExec=true being set in the config. Before this
change the --no-reschedule-failed-exec config option would be
overridden by the config.
This bug happened because of the particulars of how "rebase" works
v.s. most other git commands when it comes to parsing options and
config:
When we read the config and parse the CLI options we correctly prefer
the --no-reschedule-failed-exec option over
rebase.rescheduleFailedExec=true in the config. So far so good.
However the --reschedule-failed-exec option doesn't take effect when
the rebase starts (we'd just create a
".git/rebase-merge/reschedule-failed-exec" file if it was true). It
only takes effect when the exec command fails, at which point we'll
reschedule the failed "exec" command.
Since we only wrote out the positive
".git/rebase-merge/reschedule-failed-exec" under
--reschedule-failed-exec, but nothing with --no-reschedule-failed-exec
we'll forget that we asked not to reschedule failed "exec", and would
happily re-read the config and see that
rebase.rescheduleFailedExec=true is set.
So the config will effectively override the user having explicitly
disabled the option on the command-line.
Even more confusingly: Since rebase accepts different options based on
its state there wasn't even a way to get around this with "rebase
--continue --no-reschedule-failed-exec" (but you could of course set
the config with "rebase -c ...").
I think the least bad way out of this is to declare that for such
options and config whatever we decide at the beginning of the rebase
goes. So we'll now always create either a "reschedule-failed-exec" or
a "no-reschedule-failed-exec file at the start, not just the former if
we decided we wanted the feature.
With this new worldview you can no longer change the setting once a
rebase has started except by manually removing the state files
discussed above. I think making it work like that is the the least
confusing thing we can do.
In the future we might want to learn to change the setting in the
middle by combining "--edit-todo" with
"--[no-]reschedule-failed-exec", we currently don't support combining
those options, or any other way to change the state in the middle of
the rebase short of manually editing the files in
".git/rebase-merge/*".
The bug being fixed here originally came about because of a
combination of the behavior of the code added in d421afa0c6 (rebase:
introduce --reschedule-failed-exec, 2018-12-10) and the addition of
the config variable in 969de3ff0e (rebase: add a config option to
default to --reschedule-failed-exec, 2018-12-10).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a test added in 906b63942a (rebase --am: ignore
rebase.rescheduleFailedExec, 2019-07-01) to camel-case the
configuration variable. This doesn't change the behavior of the test,
it's merely to help its human readers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.
The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.
Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.
This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `uploadpackfilter.allow` is set to `true`, it means that filters
are enabled by default except in the case where a filter is explicitly
disabled via `uploadpackilter.<filter>.allow`. This option will not only
enable the currently supported set of filters, but also any filters
which get added in the future. As such, an admin which wants to have
tight control over which filters are allowed and which aren't probably
shouldn't ever set `uploadpackfilter.allow=true`.
Amend the documentation to make the ramifications more explicit so that
admins are aware of this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will need this functionality independent of the rest
of send_fetch_request(), so put this into its own function.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will need part, but not all, of the functionality in
add_haves(), so move some of its functionality to its sole caller
send_fetch_request().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will need part, but not all, of the functionality in
process_acks(), so move some of its functionality to its sole caller
do_fetch_pack_v2(). As a side effect, the resulting code is also
shorter.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In send_fetch_request(), "object-format" is written directly to the file
descriptor, as opposed to the other arguments, which are buffered.
Buffer "object-format" as well. "object-format" must be buffered; in
particular, it must appear after "command=fetch" in the request.
This divergence was introduced in 4b831208bb ("fetch-pack: parse and
advertise the object-format capability", 2020-05-27), perhaps as an
oversight (the surrounding code at the point of this commit has already
been using a request buffer.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gitweb extracts content from the Git log and makes it accessible
over HTTP. As a result, e-mail addresses found in commits are
exposed to web crawlers and they may not respect robots.txt.
This can result in unsolicited messages.
Introduce an 'email-privacy' feature which redacts e-mail addresses
from the generated HTML content. Specifically, obscure addresses
retrieved from the the author/committer and comment sections of the
Git log. The feature is off by default.
This feature does not prevent someone from downloading the
unredacted commit log, e.g., by cloning the repository, and
extracting information from it. It aims to hinder the low-
effort, bulk collection of e-mail addresses by web crawlers.
Signed-off-by: Georgios Kontaxis <geko1702+commits@99rst.org>
Acked-by: Eric Wong <e@80x24.org>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We auto-generate the list of supported configuration variables from
'Documentation/config/*.txt', and that list used to be created by the
'generate-cmdlist.sh' helper script and stored in the 'command-list.h'
header. Commit 709df95b78 (help: move list_config_help to
builtin/help, 2020-04-16) extracted this into a dedicated
'generate-configlist.sh' script and 'config-list.h' header, and added
a new target in the 'Makefile' as well, but while doing so it forgot
to extract the dependencies of the latter. Consequently, since then
'config-list.h' is not re-generated when 'Documentation/config/*.txt'
is updated, while 'command-list.h' is re-generated unnecessarily:
$ touch Documentation/config/log.txt
$ make -j4
GEN command-list.h
CC help.o
AR libgit.a
Fix this and list all config-related documentation files as
dependencies of 'config-list.h' and remove them from the dependencies
of 'command-list.h'.
$ touch Documentation/config/log.txt
$ make
GEN config-list.h
CC builtin/help.o
LINK git
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git add` refrains from adding or updating index entries that are
outside the current sparse checkout, but `git rm` doesn't follow the
same restriction. This is somewhat counter-intuitive and inconsistent.
So make `rm` honor the sparsity rules and advise on how to remove
SKIP_WORKTREE entries just like `add` does. Also add some tests for the
new behavior.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git add` already refrains from updating SKIP_WORKTREE entries, but it
silently exits with zero code when it is asked to do so. Instead, let's
warn the user and display a hint on how to update these entries.
Note that we only warn the user whey they give a pathspec item that
matches no eligible path for updating, but it does match one or more
SKIP_WORKTREE entries. A warning was chosen over erroring out right away
to reproduce the same behavior `add` already exhibits with ignored
files. This also allow users to continue their workflow without having
to invoke `add` again with only the eligible paths (as those will have
already been added).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
refresh_index() doesn't update SKIP_WORKTREE entries, but it still
matches them against the given pathspecs, marks the matches on the
seen[] array, check if unmerged, etc. In the following patch, one caller
will need refresh_index() to ignore SKIP_WORKTREE entries entirely, so
add a flag that implements this behavior.
While we are here, also realign the REFRESH_* flags and convert the hex
values to the more natural bit shift format, which makes it easier to
spot holes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new enum parameter to `add_pathspec_matches_against_index()` and
`find_pathspecs_matching_against_index()`, allowing callers to specify
whether these function should attempt to match SKIP_WORKTREE entries or
not. This will be used in a future patch to make `git add` display a
warning when it is asked to update SKIP_WORKTREE entries.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have a couple tests for `add` with SKIP_WORKTREE entries in
t7012, but these only cover the most basic scenarios. As we will be
changing how `add` deals with sparse paths in the subsequent commits,
let's move these two tests to their own file and add more test cases
for different `add` options and situations. This also demonstrates two
options that don't currently respect SKIP_WORKTREE entries: `--chmod`
and `--renormalize`.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git add --refresh <pathspec>` doesn't find any matches for the
given pathspec, it prints an error message using the `match` field of
the `struct pathspec_item`. However, this field doesn't contain the
magic part of the pathspec. Instead, let's use the `original` field.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.
Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.
The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS[1] valid
identifiers, symbols and numbers.
[1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1)
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git daemon" has been tightened against systems that take backslash
as directory separator.
* rs/daemon-sanitize-dir-sep:
daemon: sanitize all directory separators
The ort merge backend has been optimized by skipping irrelevant
renames.
* en/ort-perf-batch-9:
diffcore-rename: avoid doing basename comparisons for irrelevant sources
merge-ort: skip rename detection entirely if possible
merge-ort: use relevant_sources to filter possible rename sources
merge-ort: precompute whether directory rename detection is needed
merge-ort: introduce wrappers for alternate tree traversal
merge-ort: add data structures for an alternate tree traversal
merge-ort: precompute subset of sources for which we need rename detection
diffcore-rename: enable filtering possible rename sources
"git cherry-pick/revert" with or without "--[no-]edit" did not spawn
the editor as expected (e.g. "revert --no-edit" after a conflict
still asked to edit the message), which has been corrected.
* en/sequencer-edit-upon-conflict-fix:
sequencer: fix edit handling for cherry-pick and revert messages
An on-disk reverse-index to map the in-pack location of an object
back to its object name across multiple packfiles is introduced.
* tb/reverse-midx:
midx.c: improve cache locality in midx_pack_order_cmp()
pack-revindex: write multi-pack reverse indexes
pack-write.c: extract 'write_rev_file_order'
pack-revindex: read multi-pack reverse indexes
Documentation/technical: describe multi-pack reverse indexes
midx: make some functions non-static
midx: keep track of the checksum
midx: don't free midx_name early
midx: allow marking a pack as preferred
t/helper/test-read-midx.c: add '--show-objects'
builtin/multi-pack-index.c: display usage on unrecognized command
builtin/multi-pack-index.c: don't enter bogus cmd_mode
builtin/multi-pack-index.c: split sub-commands
builtin/multi-pack-index.c: define common usage with a macro
builtin/multi-pack-index.c: don't handle 'progress' separately
builtin/multi-pack-index.c: inline 'flags' with options
"git clone --reject-shallow" option fails the clone as soon as we
notice that we are cloning from a shallow repository.
* ll/clone-reject-shallow:
builtin/clone.c: add --reject-shallow option
Simplify the test added in 9466e3809d (blame: enable funcname blaming
with userdiff driver, 2020-11-01) to use the --author support recently
added in 999cfc4f45 (test-lib functions: add --author support to
test_commit, 2021-01-12).
We also did not need the full fortran-external-function content. Let's
cut it down to just the important parts.
I'm modifying it to demonstrate that the fortran-specific userdiff
function is in effect by adding "DO NOT MATCH ..." and "AS THE ..."
lines surrounding the "RIGHT" one.
This is to check that we're using the userdiff "fortran" driver, as
opposed to the default driver which would match on those lines as part
of the general heuristic of matching a line that doesn't begin with
whitespace.
The test had also been leaving behind a .gitattributes file for later
tests to possibly trip over, let's clean it up with
"test_when_finished".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor a test added in 9466e3809d (blame: enable funcname blaming
with userdiff driver, 2020-11-01) so that the blame tests don't rely
on stealing the contents of "t/t4018/fortran-external-function".
I have another patch series that'll possibly (or not) refactor that
file, but having this test inter-dependency makes things simple in any
case by making this test more readable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There have been no "broken" tests since 75c3b6b2e8 (userdiff: improve
Fortran xfuncname regex, 2020-08-12). Let's remove the test support
for them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the userdiff test to list the builtin drivers via the
test-tool, using the new for_each_userdiff_driver() API function.
This gets rid of the need to modify this part of the test every time a
new pattern is added, see 2ff6c34612 (userdiff: support Bash,
2020-10-22) and 09dad9256a (userdiff: support Markdown, 2020-05-02)
for two recent examples.
I only need the "list-builtin-drivers "argument here, but let's add
"list-custom-drivers" and "list-drivers" too, just because it's easy.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 122aa6f9c0 (diff: introduce diff.<driver>.binary, 2008-10-05)
the internals of the userdiff.c code have understood a "default" name,
which is invoked as userdiff_find_by_name("default") and present in
the "builtin_drivers" struct. Let's test for this special case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the userdiff_find_by_namelen() function so that a new
for_each_userdiff_driver() API function does most of the work.
This will be useful for the same reason we've got other for_each_*()
API functions as part of various APIs, and will be used in a follow-up
commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Declare the pascal pattern consistently with how we declare the
others, not having "\n" on one line by itself, but as part of the
pattern, and when there are alterations have the "|" at the start, not
end of the line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those patterns which were declared with a regex on the same
line as the "PATTERNS()" line to put that regex on the next line, and
add missing "/* -- */" separator comments between the pattern and
word_regex.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Address some old code smell and move around the built-in userdiff
drivers so they're both in alphabetical order, and now in the same
order they appear in the gitattributes(5) documentation.
The two started drifting in be58e70dba (diff: unify external diff and
funcname parsing code, 2008-10-05), and then even further in
80c49c3de2 (color-words: make regex configurable via attributes,
2009-01-17) when the "cpp" pattern was added.
There are no functional changes here, and as --color-moved will show
only moved existing lines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a use of GIT_TEST_GETTEXT_POISON added in f276e2a469 (config:
improve error message for boolean config, 2021-02-11).
This was simultaneously in-flight with my d162b25f95 (tests: remove
support for GIT_TEST_GETTEXT_POISON, 2021-01-20) which removed the
rest of the GIT_TEST_GETTEXT_POISON code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
$GIT_COMPLETION_SHOW_ALL and $GIT_TESTING_ALL_COMMAND_LIST were used
without guarding against them being unset, causing errors in nounset
(set -u) mode.
No other nounset-unsafe $GIT_* usages were found.
While at it, remove a superfluous (duplicate) unset guard from $GIT_DIR
in __git_find_repo_path.
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" does not allow "--cached" and "--3way" to be used
together, since "--3way" writes conflict markers into the working
tree.
Allow "git apply" to accept "--cached" and "--3way" at the same
time. When a single file auto-resolves cleanly, the result is
placed in the index at stage #0 and the command exits with 0 status.
For a file that has a conflict which cannot be cleanly
auto-resolved, the original contents from common ancestor (stage
conflict at the content level, and the command exists with non-zero
status, because there is no place (like the working tree) to leave a
half-resolved merge for the user to resolve.
The user can use `git diff` to view the contents of the conflict, or
`git checkout -m -- .` to regenerate the conflict markers in the
working directory.
Don't attempt rerere in this case since it depends on conflict
markers written to file for its database storage and lookup. There
would be two main changes required to get rerere working:
1. Allow the rerere api to accept in memory object rather than
files, which would allow us to pass in the conflict markers
contained in the result from ll_merge().
2. Rerere can't write to the working directory, so it would have to
apply the result to cache stage #0 directly. A flag would be
needed to control this.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck API clean-up.
* ab/fsck-api-cleanup:
fetch-pack: use new fsck API to printing dangling submodules
fetch-pack: use file-scope static struct for fsck_options
fetch-pack: don't needlessly copy fsck_options
fsck.c: move gitmodules_{found,done} into fsck_options
fsck.c: add an fsck_set_msg_type() API that takes enums
fsck.c: pass along the fsck_msg_id in the fsck_error callback
fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
fsck.c: give "FOREACH_MSG_ID" a more specific name
fsck.c: undefine temporary STR macro after use
fsck.c: call parse_msg_type() early in fsck_set_msg_type()
fsck.h: re-order and re-assign "enum fsck_msg_type"
fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
fsck.c: remove (mostly) redundant append_msg_id() function
fsck.c: rename variables in fsck_set_msg_type() for less confusion
fsck.h: use "enum object_type" instead of "int"
fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
fsck.c: refactor and rename common config callback
A few option description strings started with capital letters,
which were corrected.
* cc/downcase-opt-help:
column, range-diff: downcase option description
SECURITY.md that is facing individual contributors and end users
has been introduced. Also a procedure to follow when preparing
embargoed releases has been spelled out.
* js/security-md:
Document how we do embargoed releases
SECURITY: describe how to report vulnerabilities
Optimize "rev-list --use-bitmap-index --objects" corner case that
uses negative tags as the stopping points.
* ps/pack-bitmap-optim:
pack-bitmap: avoid traversal of objects referenced by uninteresting tag
"git commit" learned "--trailer <key>[=<value>]" option; together
with the interpret-trailers command, this will make it easier to
support custom trailers.
* zh/commit-trailer:
commit: add --trailer option
The hashwrite() API uses a buffering mechanism to avoid calling
write(2) too frequently. This logic has been refactored to be
easier to understand.
* ds/clarify-hashwrite:
csum-file: make hashwrite() more readable
Plug or annotate remaining leaks that trigger while running the
very basic set of tests.
* ah/plugleaks:
transport: also free remote_refs in transport_disconnect()
parse-options: don't leak alias help messages
parse-options: convert bitfield values to use binary shift
init-db: silence template_dir leak when converting to absolute path
init: remove git_init_db_config() while fixing leaks
worktree: fix leak in dwim_branch()
clone: free or UNLEAK further pointers when finished
reset: free instead of leaking unneeded ref
symbolic-ref: don't leak shortened refname in check_symref()
When e.g. in a failed cherry pick we did not recognize
CHERRY_PICK_HEAD as we do e.g. REBASE_HEAD in a failed rebase let's
rectify that.
When REBASE_HEAD was added in fbd7a23237 (rebase: introduce and use
pseudo-ref REBASE_HEAD, 2018-02-11) a completion was added for it, but
no corresponding completion existed for CHERRY_PICK_HEAD added in
d7e5c0cbfb (Introduce CHERRY_PICK_HEAD, 2011-02-19).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The apply_fragments() method of "git apply"
can silently apply patches incorrectly if
a file has repeating contents. In these
cases a three-way merge is capable of applying
it correctly in more situations, and will
show a conflict rather than applying it
incorrectly. However, because the patches
apply "successfully" using apply_fragments(),
git will never fall back to the merge, even
if the "--3way" flag is used, and the user has
no way to ensure correctness by forcing the
three-way merge method.
Change the behavior so that when "--3way" is used,
git will always try the three-way merge first and
will only fall back to apply_fragments() in cases
where blobs are not available or some other error
(but not in the case of a merge conflict).
Since user-facing results will be different,
this has backwards compatibility implications
for users depending on the old behavior. In
addition, the three-way merge will be slower
than direct patch application.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous logic filled a string list with the names of each remote,
but instead we could simply run the appropriate 'git fetch' data
directly in the remote iterator. Do this for reduced code size, but also
because it sets up an upcoming change to use the remote's refspec. This
data is accessible from the 'struct remote' data that is now accessible
in fetch_remote().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the output we emit on --validate error to:
* Say "FILE:LINE" instead of "FILE: LINE", to match "grep -n",
compiler error messages etc.
* Don't say "patch contains a" after just mentioning the filename,
just leave it at "FILE:LINE: is longer than[...]. The "contains a"
sounded like we were talking about the file in general, when we're
actually checking it line-by-line.
* Don't just say "rejected by sendemail-validate hook", but combine
that with the system_or_msg() output to say what exit code the hook
died with.
I had an aborted attempt to make the line length checker note all
lines that were longer than the limit. I didn't think that was worth
the effort, but I've left in the testing change to check that we die
as soon as we spot the first long line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the tests that grep substrings out of the output to use a full
test_cmp, in preparation for improving the output.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like 'get_murmur3' and 'generate_filter', 'get_filter_for_commit' is a
subcommand of `test-tool bloom` not of `test-tool` itself.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "prefix" was precomposed for macOS in commit 5c327502 (MacOS:
precompose_argv_prefix(), 2021-02-03).
However, this commit forgot to update "startup_info->prefix" after
precomposing.
Move the (possible) precomposition towards the end of
setup_git_directory_gently(), so that precompose_string_if_needed()
can use git_config_get_bool("core.precomposeunicode") correctly.
Keep prefix, startup_info->prefix and GIT_PREFIX_ENVIRONMENT all in sync.
And as a result, the prefix no longer needs to be precomposed in git.c
Reported-by: Dmitry Torilov <d.torilov@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 5c327502 (MacOS: precompose_argv_prefix(), 2021-02-03) uses
the function precompose_string_if_needed() internally. It is only
used from precompose_argv_prefix() and therefore static in
compat/precompose_utf8.c
Expose this function, it will be used in the next commit.
While there, allow passing a NULL pointer, which will return NULL.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two among the three warnings raised by "make git.info" are related to the fact
that the preface has not id in user-manual.txt.
user-manual.texi:15: warning: empty menu entry name in `* : idm4.'
user-manual.texi:141: warning: @unnumbered missing argument
This causes asciidoc creating an empty preface and an empty title tag in
user-manual.xml which turns to be an empty node in user-manual.texi and
git.info. Consequently, one can notice in user-manual.texi and git.info
a node named "idm4" in the menu and the navigation bar. In emacs, the
first entry of the menu in the git info page is even displayed as empty.
This fix will name "Introduction" the preface and assign it an id.
The result can be seen in the files: user-manual.{xml, texi, html, pdf}
and git.info.
For future reference, the diff between old and new user-manual.xml,
user-manual.texi, git.info, user-manual.html (converted through
html2markdown) and user-manual.pdf (converted through pdftotext) are
attached.
--- before/user-manual.xml 2021-04-04 03:58:47.758008722 +0200
+++ after/user-manual.xml 2021-04-04 03:56:40.520551163 +0200
@@ -7,8 +7,8 @@
<bookinfo>
<title>Git User Manual</title>
</bookinfo>
-<preface>
-<title></title>
+<preface id="_introduction">
+<title>Introduction</title>
<simpara>Git is a fast distributed revision control system.</simpara>
<simpara>This manual is designed to be readable by someone with basic UNIX
command-line skills, but no previous knowledge of Git.</simpara>
--- before/user-manual.texi 2021-04-04 03:58:47.490005652 +0200
+++ after/user-manual.texi 2021-04-04 03:56:40.520551163 +0200
@@ -7,12 +7,12 @@
* Git: (git). A fast distributed revision control system
@end direntry
-@node Top, idm4, , (dir)
+@node Top, Introduction, , (dir)
@documentlanguage en
@top Git User Manual
@menu
-* : idm4.
+* Introduction::
* Repositories and Branches::
* Exploring Git history::
* Developing with Git::
@@ -137,8 +137,8 @@
@end detailmenu
@end menu
-@node idm4, Repositories and Branches, Top, Top
-@unnumbered
+@node Introduction, Repositories and Branches, Top, Top
+@unnumbered Introduction
Git is a fast distributed revision control system.
@@ -178,7 +178,7 @@
Finally, see @ref{Notes and todo list for this manual} for ways that you can help make this manual more
complete.
-@node Repositories and Branches, Exploring Git history, idm4, Top
+@node Repositories and Branches, Exploring Git history, Introduction, Top
@chapter Repositories and Branches
@menu
--- before/git.info 2021-04-04 03:58:46.557994966 +0200
+++ after/git.info 2021-04-04 03:56:40.520551163 +0200
@@ -7,14 +7,14 @@
END-INFO-DIR-ENTRY
-File: git.info, Node: Top, Next: idm4, Up: (dir)
+File: git.info, Node: Top, Next: Introduction, Up: (dir)
Git User Manual
***************
* Menu:
-* : idm4.
+* Introduction::
* Repositories and Branches::
* Exploring Git history::
* Developing with Git::
@@ -137,7 +137,10 @@
-File: git.info, Node: idm4, Next: Repositories and Branches, Prev: Top, Up: Top
+File: git.info, Node: Introduction, Next: Repositories and Branches, Prev: Top, Up: Top
+
+Introduction
+************
Git is a fast distributed revision control system.
@@ -174,7 +177,7 @@
that you can help make this manual more complete.
-File: git.info, Node: Repositories and Branches, Next: Exploring Git history, Prev: idm4, Up: Top
+File: git.info, Node: Repositories and Branches, Next: Exploring Git history, Prev: Introduction, Up: Top
1 Repositories and Branches
***************************
@@ -5471,207 +5474,207 @@
...
Tag Table:
Node: Top212
-Node: idm43164
-Node: Repositories and Branches4465
...
+Node: Introduction3179
+Node: Repositories and Branches4515
+Node: How to get a Git repository5128
...
End Tag Table
--- before/user-manual.html.md 2021-04-04 05:20:55.378695854 +0200
+++ after/user-manual.html.md 2021-04-04 05:21:11.282850802 +0200
@@ -4,6 +4,8 @@
**Table of Contents**
+Introduction
+
1\. Repositories and Branches
@@ -278,7 +280,7 @@
Todo list
-#
+# Introduction
Git is a fast distributed revision control system.
--- before/user-manual.pdf.txt 2021-04-04 05:28:20.367036836 +0200
+++ after/user-manual.pdf.txt 2021-04-04 05:30:01.680026312 +0200
@@ -487,6 +487,7 @@
vii
+Introduction
Git is a fast distributed revision control system.
This manual is designed to be readable by someone with basic UNIX command-line skills, but no previous knowledge of Git.
Chapter 1 and Chapter 2 explain how to fetch and study a project using git—read these chapters to learn how to build and test a
Signed-off-by: Firmin Martin <firminmartin24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch -v<n>" learned to allow a reroll count that is
not an integer.
* zh/format-patch-fractional-reroll-count:
format-patch: allow a non-integral version numbers
A simple IPC interface gets introduced to build services like
fsmonitor on top.
* jh/simple-ipc:
t0052: add simple-ipc tests and t/helper/test-simple-ipc tool
simple-ipc: add Unix domain socket implementation
unix-stream-server: create unix domain socket under lock
unix-socket: disallow chdir() when creating unix domain sockets
unix-socket: add backlog size option to unix_stream_listen()
unix-socket: eliminate static unix_stream_socket() helper function
simple-ipc: add win32 implementation
simple-ipc: design documentation for new IPC mechanism
pkt-line: add options argument to read_packetized_to_strbuf()
pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option
pkt-line: do not issue flush packets in write_packetized_*()
pkt-line: eliminate the need for static buffer in packet_write_gently()
Preparatory API changes for parallel checkout.
* mt/parallel-checkout-part-1:
entry: add checkout_entry_ca() taking preloaded conv_attrs
entry: move conv_attrs lookup up to checkout_entry()
entry: extract update_ce_after_write() from write_entry()
entry: make fstat_output() and read_blob_entry() public
entry: extract a header file for entry.c functions
convert: add classification for conv_attrs struct
convert: add get_stream_filter_ca() variant
convert: add [async_]convert_to_working_tree_ca() variants
convert: make convert_attrs() and convert structs public
While using "map" instead of "for" or "map" instead of "grep" and
vice-versa makes for interesting trivia questions when interviewing
Perl programmers, it doesn't make for very readable code. Let's
refactor this loop initially added in 8fd5bb7f44 (git send-email: add
--annotate option, 2008-11-11) to be a for-loop instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't show the very verbose $(FIND_SOURCE_FILES) command on every
"make TAGS" invocation.
Let's use "generate into temporary and rename to the final file,
after seeing the command that generated the output finished
successfully" pattern, to avoid leaving a file with an incorrect
output generated by a failed command.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a lot of pointer dereferencing in the pre-image version of
'midx_pack_order_cmp()', which this patch gets rid of.
Instead of comparing the pack preferred-ness and then the pack id, both
of these checks are done at the same time by using the high-order bit of
the pack id to represent whether it's preferred. Then the pack id and
offset are compared as usual.
This produces the same result so long as there are less than 2^31 packs,
which seems like a likely assumption to make in practice.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.
Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.
Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.
(In midx.c:write_midx_internal(), the two adjacent if statements share a
conditional, but are written separately since the first one will
eventually also handle the MIDX_WRITE_BITMAP flag, which does not yet
exist.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Existing callers provide the reverse index code with an array of 'struct
pack_idx_entry *'s, which is then sorted by pack order (comparing the
offsets of each object within the pack).
Prepare for the multi-pack index to write a .rev file by providing a way
to write the reverse index without an array of pack_idx_entry (which the
MIDX code does not have).
Instead, callers can invoke 'write_rev_index_positions()', which takes
an array of uint32_t's. The ith entry in this array specifies the ith
object's (in index order) position within the pack (in pack order).
Expose this new function for use in a later patch, and rewrite the
existing write_rev_file() in terms of this new function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement reading for multi-pack reverse indexes, as described in the
previous patch.
Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.
There are three new functions exposed by the revindex API:
- load_midx_revindex(): loads the reverse index corresponding to the
given multi-pack index.
- midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
multi-pack index and pseudo-pack order.
load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.
load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.
pack_pos_to_midx() simply reads the corresponding entry out of the
table.
midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.
So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.
The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.
The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.
Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.
That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.
Store the checksum of the MIDX in preparation for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.
Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).
To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.
No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).
[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some scenarios, users may want more history than the repository
offered for cloning, which happens to be a shallow repository, can
give them. But because users don't know it is a shallow repository
until they download it to local, we may want to refuse to clone
this kind of repository, without creating any unnecessary files.
The '--depth=x' option cannot be used as a solution; the source may
be deep enough to give us 'x' commits when cloned, but the user may
later need to deepen the history to arbitrary depth.
Teach '--reject-shallow' option to "git clone" to abort as soon as
we find out that we are cloning from a shallow repository.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we run parse_object_buffer() to get an object's contents, we try
to check that the return value wasn't NULL. However, since our "struct
object" is a pointer-to-pointer, and we assign like:
*obj = parse_object_buffer(...);
it's not correct to check:
if (!obj)
That will always be true, since our double pointer will continue to
point to the single pointer (which is itself NULL). This is a regression
that was introduced by aa46a0da30 (ref-filter: use oid_object_info() to
get object, 2018-07-17); since that commit we'll segfault on a parse
failure, as we try to look at the NULL object pointer.
There are many ways a parse could fail, but most of them are hard to set
up in the tests (it's easy to make a bogus object, but update-ref will
refuse to point to it). The test here uses a tag which points to a wrong
object type. A parse of just the broken tag object will succeed, but
seeing both tag objects in the same process will lead to a parse error
(since we'll see the pointed-to object as both types).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new pack with a bitmap, it is sometimes convenient to
indicate some reference prefixes which should receive priority when
selecting which commits to receive bitmaps.
A truly motivated caller could accomplish this by setting
'pack.islandCore', (since all commits in the core island are similarly
marked as preferred) but this requires callers to opt into using delta
islands, which they may or may not want to do.
Introduce a new multi-valued configuration, 'pack.preferBitmapTips' to
allow callers to specify a list of reference prefixes. All references
which have a prefix contained in 'pack.preferBitmapTips' will mark their
tips as "preferred" in the same way as commits are marked as preferred
for selection by 'pack.islandCore'.
The choice of the verb "prefer" is intentional: marking the NEEDS_BITMAP
flag on an object does *not* guarantee that that object will receive a
bitmap. It merely guarantees that that commit will receive a bitmap over
any *other* commit in the same window by bitmap_writer_select_commits().
The test this patch adds reflects this quirk, too. It only tests that
a commit (which didn't receive bitmaps by default) is selected for
bitmaps after changing the value of 'pack.preferBitmapTips' to include
it. Other commits may lose their bitmaps as a byproduct of how the
selection process works (bitmap_writer_select_commits() ignores the
remainder of a window after seeing a commit with the NEEDS_BITMAP flag).
This configuration will aide in selecting important references for
multi-pack bitmaps, since they do not respect the same pack.islandCore
configuration. (They could, but doing so may be confusing, since it is
packs--not bitmaps--which are influenced by the delta-islands
configuration).
In a fork network repository (one which lists all forks of a given
repository as remotes), for example, it is useful to set
pack.preferBitmapTips to 'refs/remotes/<root>/heads' and
'refs/remotes/<root>/tags', where '<root>' is an opaque identifier
referring to the repository which is at the base of the fork chain.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new 'bitmap' test-tool which can be used to list the commits that
have received bitmaps.
In theory, a determined tester could run 'git rev-list --test-bitmap
<commit>' to check if '<commit>' received a bitmap or not, since
'--test-bitmap' exits with a non-zero code when it can't find the
requested commit.
But this is a dubious behavior to rely on, since arguably 'git
rev-list' could continue its object walk outside of which commits are
covered by bitmaps.
This will be used to test the behavior of 'pack.preferBitmapTips', which
will be added in the following patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next patch will add a 'bitmap' test-tool which prints the list of
commits that have bitmaps computed.
The test helper could implement this itself, but it would need access to
the 'bitmaps' field of the 'pack_bitmap' struct. To avoid exposing this
private detail, implement the entirety of the helper behind a
test_bitmap_commits() function in pack-bitmap.c.
There is some precedence for this with test_bitmap_walk() which is used
to implement the '--test-bitmap' flag in 'git rev-list' (and is also
implemented in pack-bitmap.c).
A caller will be added in the next patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
save_opts() should save any non-default values. It was intended to do
this, but since most options in struct replay_opts default to 0, it only
saved non-zero values. Unfortunately, this does not always work for
options.edit. Roughly speaking, options.edit had a default value of 0
for cherry-pick but a default value of 1 for revert. Make save_opts()
record a value whenever it differs from the default.
options.edit was also overly simplistic; we had more than two cases.
The behavior that previously existed was as follows:
Non-conflict commits Right after Conflict
revert Edit iff isatty(0) Edit (ignore isatty(0))
cherry-pick No edit See above
Specify --edit Edit (ignore isatty(0)) See above
Specify --no-edit (*) See above
(*) Before stopping for conflicts, No edit is the behavior. After
stopping for conflicts, the --no-edit flag is not saved so see
the first two rows.
However, the expected behavior is:
Non-conflict commits Right after Conflict
revert Edit iff isatty(0) Edit iff isatty(0)
cherry-pick No edit Edit iff isatty(0)
Specify --edit Edit (ignore isatty(0)) Edit (ignore isatty(0))
Specify --no-edit No edit No edit
In order to get the expected behavior, we need to change options.edit
to a tri-state: unspecified, false, or true. When specified, we follow
what it says. When unspecified, we need to check whether the current
commit being created is resolving a conflict as well as consulting
options.action and isatty(0). While at it, add a should_edit() utility
function that compresses options.edit down to a boolean based on the
additional information for the non-conflict case.
continue_single_pick() is the function responsible for resuming after
conflict cases, regardless of whether there is one commit being picked
or many. Make this function stop assuming edit behavior in all cases,
so that it can correctly handle !isatty(0) and specific requests to not
edit the commit message.
Reported-by: Renato Botelho <garga@freebsd.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain pieces of the format-patch output upfront before the rest
of the documentation starts referring to them.
* jc/doc-format-patch-clarify:
format-patch: give an overview of what a "patch" message is
Remove the final hint that we used to have a scripted "git rebase".
* ab/remove-rebase-usebuiltin:
rebase: remove transitory rebase.useBuiltin setting & env
When accessing a server with a URL like https://user:pass@site/, we
did not to fall back to the basic authentication with the
credential material embedded in the URL after the "Negotiate"
authentication failed. Now we do.
* cs/http-use-basic-after-failed-negotiate:
remote-curl: fall back to basic auth if Negotiate fails
More test coverage over "diff --no-index".
* ab/diff-no-index-tests:
diff --no-index tests: test mode normalization
diff --no-index tests: add test for --exit-code
Code simplification by removing support for a caller that is long gone.
* ab/read-tree:
tree.h API: simplify read_tree_recursive() signature
tree.h API: expose read_tree_1() as read_tree_at()
archive: stop passing "stage" through read_tree_recursive()
ls-files: refactor away read_tree()
ls-files: don't needlessly pass around stage variable
tree.c API: move read_tree() into builtin/ls-files.c
ls-files tests: add meaningful --with-tree tests
show tests: add test for "git show <tree>"
When "git checkout" removes a path that does not exist in the
commit it is checking out, it wasn't careful enough not to follow
symbolic links, which has been corrected.
* mt/checkout-remove-nofollow:
checkout: don't follow symlinks when removing entries
symlinks: update comment on threaded_check_leading_path()
p2000-sparse-operations.sh compares different Git commands in
repositories with many files at HEAD but using sparse-checkout to focus
on a small portion of those files.
Add extra copies of the repository that use the sparse-index format so
we can track how that affects the performance of different commands.
At this point in time, the sparse-index is 100% overhead from the CPU
front, and this is measurable in these tests:
Test
---------------------------------------------------------------
2000.2: git status (full-index-v3) 0.59(0.51+0.12)
2000.3: git status (full-index-v4) 0.59(0.52+0.11)
2000.4: git status (sparse-index-v3) 1.40(1.32+0.12)
2000.5: git status (sparse-index-v4) 1.41(1.36+0.08)
2000.6: git add -A (full-index-v3) 2.32(1.97+0.19)
2000.7: git add -A (full-index-v4) 2.17(1.92+0.14)
2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15)
2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13)
2000.10: git add . (full-index-v3) 2.39(2.02+0.20)
2000.11: git add . (full-index-v4) 2.20(1.94+0.16)
2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12)
2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16)
2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20)
2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17)
2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16)
2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15)
Note that there is very little difference between the v3 and v4 index
formats when the sparse-index is enabled. This is primarily due to the
fact that the relative file sizes are the same, and the command time is
mostly taken up by parsing tree objects to expand the sparse index into
a full one.
With the current file layout, the index file sizes are given by this
table:
| full index | sparse index |
+-------------+--------------+
v3 | 108 MiB | 1.6 MiB |
v4 | 80 MiB | 1.2 MiB |
Future updates will improve the performance of Git commands when the
index is sparse.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE
is enabled, which it is by default in the test suite. The logic must
be adjusted for the presence of these directory entries.
For now, leave the test as a simple check for whether the directory
entry is sparse. Do not go any further until needed.
This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in
t1092-sparse-checkout-compatibility.sh. Further,
p2000-sparse-operations.sh uses the test suite and hence this is enabled
for all tests. We need to integrate with it before we run our
performance tests with a sparse-index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.
When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use 'git sparse-checkout init --cone --sparse-index' to toggle the
sparse-index feature. It makes sense to also disable it when running
'git sparse-checkout disable'. This is particularly important because it
removes the extensions.sparseIndex config option, allowing other tools
to use this Git repository again.
This does mean that 'git sparse-checkout init' will not re-enable the
sparse-index feature, even if it was previously enabled.
While testing this feature, I noticed that the sparse-index was not
being written on the first run, but by a second. This was caught by the
call to 'test-tool read-cache --table'. This requires adjusting some
assignments to core_apply_sparse_checkout and pl.use_cone_patterns in
the sparse_checkout_init() logic.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse index extension is used to signal that index writes should be
in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1.
Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that
specifies if the sparse index should be used. It also updates the index
to use the correct format, either way. Add a warning in the
documentation that the use of a repository extension might reduce
compatibility with third-party tools. 'git sparse-checkout init' already
sets extension.worktreeConfig, which places most sparse-checkout users
outside of the scope of most third-party tools.
Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of
GIT_TEST_SPARSE_INDEX=1.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When enabled, this config option signals that index writes should
attempt to use sparse-directory entries.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test case that uses test_region to ensure that we are truly
expanding a sparse index to a full one, then converting back to sparse
when writing the index. As we integrate more Git commands with the
sparse index, we will convert these commands to check that we do _not_
convert the sparse index to a full index and instead stay sparse the
entire time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index_pos_by_traverse_info() currently throws a BUG() when a
directory entry exists exactly in the index. We need to consider that it
is possible to have a directory in a sparse index as long as that entry
is itself marked with the skip-worktree bit.
The 'pos' variable is assigned a negative value if an exact match is not
found. Since a directory name can be an exact match, it is no longer an
error to have a nonnegative 'pos' value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A submodule is stored as a "Git link" that actually points to a commit
within a submodule. Submodules are populated or not depending on
submodule configuration, not sparse-checkout. To ensure that the
sparse-index feature integrates correctly with submodules, we should not
collapse a directory if there is a Git link within its range.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.
For now, we avoid converting the index to a sparse index if:
1. the index is split.
2. the index is already sparse.
3. sparse-checkout is disabled.
4. sparse-checkout does not use cone mode.
Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.
The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.
The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.
Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."
We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.
The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index format does not currently allow for sparse directory entries.
This violates some expectations that older versions of Git or
third-party tools might not understand. We need an indicator inside the
index file to warn these tools to not interact with a sparse index
unless they are aware of sparse directory entries.
Add a new _required_ index extension, 'sdir', that indicates that the
index may contain sparse directory entries. This allows us to continue
to use the differences in index formats 2, 3, and 4 before we create a
new index version 5 in a later change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we modify the sparse-checkout definition, we perform index operations
on a pattern_list that only exists in-memory. This allows easy backing
out in case the index update fails.
However, if the index write itself cares about the sparse-checkout
pattern set, we need access to that in-memory copy. Place a pointer to
a 'struct pattern_list' in the index so we can access this on-demand.
This will be used in the next change which uses the sparse-checkout
definition to filter out directories that are outside the sparse cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next change will translate full indexes into sparse indexes at write
time. The existing logic provides a way for every sparse index to be
expanded to a full index at read time. However, there are cases where an
index is written and then continues to be used in-memory to perform
further updates.
unpack_trees() is frequently called after such a write. In particular,
commands like 'git reset' do this double-update of the index.
Ensure that we have a full index when entering unpack_trees(), but only
when command_requires_full_index is true. This is always true at the
moment, but we will later relax that after unpack_trees() is updated to
handle sparse directory entries.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will use 'test-tool read-cache --table' to check that a sparse
index is written as part of init_repos. Since we will no longer always
expand a sparse index into a full index, add an '--expand' parameter
that adds a call to ensure_full_index() so we can compare a sparse index
directly against a full index, or at least what the in-memory index
looks like when expanded in this way.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This table is helpful for discovering data in the index to ensure it is
being written correctly, especially as we build and test the
sparse-index. This table includes an output format similar to 'git
ls-tree', but should not be compared to that directly. The biggest
reasons are that 'git ls-tree' includes a tree entry for every
subdirectory, even those that would not appear as a sparse directory in
a sparse-index. Further, 'git ls-tree' does not use a trailing directory
separator for its tree rows.
This does not print the stat() information for the blobs. That will be
added in a future change with another option. The tests that are added
in the next few changes care only about the object types and IDs.
However, this future need for full index information justifies the need
for this test helper over extending a user-facing feature, such as 'git
ls-files'.
To make the option parsing slightly more robust, wrap the string
comparisons in a loop adapted from test-dir-iterator.c.
Care must be taken with the final check for the 'cnt' variable. We
continue the expectation that the numerical value is the final argument.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new 'sparse-index' repo alongside the 'full-checkout' and
'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also
add run_on_sparse and test_sparse_match helpers. These helpers will be
used when the sparse index is implemented.
Add the GIT_TEST_SPARSE_INDEX environment variable to enable the
sparse-index by default. This can be enabled across all tests, but that
will only affect cases where the sparse-checkout feature is enabled.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will mark an in-memory index_state as having sparse directory entries
with the sparse_index bit. These currently cannot exist, but we will add
a mechanism for collapsing a full index to a sparse one in a later
change. That will happen at write time, so we must first allow parsing
the format before writing it.
Commands or methods that require a full index in order to operate can
call ensure_full_index() to expand that index in-memory. This requires
parsing trees using that index's repository.
Sparse directory entries have a specific 'ce_mode' value. The macro
S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
This ce_mode is not possible with the existing index formats, so we don't
also verify all properties of a sparse-directory entry, which are:
1. ce->ce_mode == 0040000
2. ce->flags & CE_SKIP_WORKTREE is true
3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
4. ce->oid references a tree object.
These are all semi-enforced in ensure_full_index() to some extent. Any
deviation will cause a warning at minimum or a failure in the worst
case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upcoming changes will introduce modifications to the index format that
allow sparse directories. It will be useful to have a mechanism for
converting those sparse index files into full indexes by walking the
tree at those sparse directories. Name this method ensure_full_index()
as it will guarantee that the index is fully expanded.
This method is not implemented yet, and instead we focus on the
scaffolding to declare it and call it at the appropriate time.
Add a 'command_requires_full_index' member to struct repo_settings. This
will be an indicator that we need the index in full mode to do certain
index operations. This starts as being true for every command, then we
will set it to false as some commands integrate with sparse indexes.
If 'command_requires_full_index' is true, then we will immediately
expand a sparse index to a full one upon reading from disk. This
suffices for now, but we will want to add more callers to
ensure_full_index() later.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test was introduced in 19a0acc83e (t1092: test interesting
sparse-checkout scenarios, 2021-01-23), but it contains issues with quoting
that were not noticed until starting this follow-up series. The old
mechanism would drop quoting such as in
test_all_match git commit -m "touch README.md"
The above happened to work because README.md is a file in the
repository, so 'git commit -m touch REAMDE.md' would succeed by
accident.
Other cases included quoting for no good reason, so clean that up now.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a test script that takes the default performance test (the Git
codebase) and multiplies it by 256 using four layers of duplicated
trees of width four. This results in nearly one million blob entries in
the index. Then, we can clone this repository with sparse-checkout
patterns that demonstrate four copies of the initial repository. Each
clone will use a different index format or mode so peformance can be
tested across the different options.
Note that the initial repo is stripped of submodules before doing the
copies. This preserves the expected data shape of the sparse index,
because directories containing submodules are not collapsed to a sparse
directory entry.
Run a few Git commands on these clones, especially those that use the
index (status, add, commit).
Here are the results on my Linux machine:
Test
--------------------------------------------------------------
2000.2: git status (full-index-v3) 0.37(0.30+0.09)
2000.3: git status (full-index-v4) 0.39(0.32+0.10)
2000.4: git add -A (full-index-v3) 1.42(1.06+0.20)
2000.5: git add -A (full-index-v4) 1.26(0.98+0.16)
2000.6: git add . (full-index-v3) 1.40(1.04+0.18)
2000.7: git add . (full-index-v4) 1.26(0.98+0.17)
2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16)
2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16)
It is perhaps noteworthy that there is an improvement when using index
version 4. This is because the v3 index uses 108 MiB while the v4
index uses 80 MiB. Since the repeated portions of the directories are
very short (f3/f1/f2, for example) this ratio is less pronounced than in
similarly-sized real repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This begins a long effort to update the index format to allow sparse
directory entries. This should result in a significant improvement to
Git commands when HEAD contains millions of files, but the user has
selected many fewer files to keep in their sparse-checkout definition.
Currently, the index format is only updated in the presence of
extensions.sparseIndex instead of increasing a file format version
number. This is temporary, and index v5 is part of the plan for future
work in this area.
The design document details many of the reasons for embarking on this
work, and also the plan for completing it safely.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.
In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.
To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When given a sub-command that it doesn't understand, 'git
multi-pack-index' dies with the following message:
$ git multi-pack-index bogus
fatal: unrecognized subcommand: bogus
Instead of 'die()'-ing, we can display the usage text, which is much
more helpful:
$ git.compile multi-pack-index bogus
error: unrecognized subcommand: bogus
usage: git multi-pack-index [<options>] write
or: git multi-pack-index [<options>] verify
or: git multi-pack-index [<options>] expire
or: git multi-pack-index [<options>] repack [--batch-size=<size>]
--object-dir <file> object directory containing set of packfile and pack-index pairs
--progress force progress reporting
While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.
Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.
This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).
Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).
Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.
A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.
Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.
Right now, options and flags are handled separately. That's fine, since
the options structure is never passed around. But a future patch will
make it so that common options shared by all sub-commands are defined in
a common location. That means that "flags" would have to become a global
variable.
Group it with the options structure so that we reduce the number of
global variables we have overall.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is customary not to begin the help text for each option given to
the parse-options API with a capital letter. Various (sub)commands'
option arrays don't follow the guideline provided by the parse_options
Documentation regarding the descriptions.
Downcase the first word of some option descriptions for "column"
and "range-diff".
Signed-off-by: Chinmoy Chakraborty <chinmoy12c@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our CMake configuration generates not only build definitions, but also
install definitions: After building Git using `msbuild git.sln`, the
built artifacts can be installed via `msbuild INSTALL.vcxproj`.
To specify _where_ the files should be installed, the
`-DCMAKE_INSTALL_PREFIX=<path>` option can be used when running CMake.
However, this process would really only install the files that were just
built. On Windows, we need more than that: We also need the `.dll` files
of the dependencies (such as libcurl). The `vcpkg` ecosystem, which we
use to obtain those dependencies, can be asked to install said `.dll`
files really easily, so let's do that.
This requires more than just the built `vcpkg` artifacts in the CI build
definition; We now clone the `vcpkg` repository so that the relevant
CMake scripts are available, in particular the ones related to defining
the toolchain.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to add support for installing the `.dll` files of Git's
dependencies (such as libcurl) in the CMake configuration. The `vcpkg`
ecosystem from which we get said dependencies makes that relatively
easy: simply turn on `X_VCPKG_APPLOCAL_DEPS_INSTALL`.
However, current `vcpkg` introduces a limitation if one does that:
While it is totally cool with CMake to specify multiple targets within
one invocation of `install(TARGETS ...) (at least according to
https://cmake.org/cmake/help/latest/command/install.html#command:install),
`vcpkg`'s parser insists on a single target per `install(TARGETS ...)`
invocation.
Well, that's easily accomplished: Let's feed the targets individually to
the `install(TARGETS ...)` function in a `foreach()` look.
This also has the advantage that we do not have to manually cull off the
two entries from the `${PROGRAMS_BUILT}` array before scheduling the
remainder to be installed into `libexec/git-core`. Instead, we iterate
through the array and decide for each entry where it wants to go.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.
Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.
I'm sticking this callback in fsck.c. Perhaps in the future we'd like
to accumulate such callbacks into another file (maybe fsck-cb.c,
similar to parse-options-cb.c?), but while we've got just the one
let's just put it into fsck.c.
A better alternative in this case would be some library some more
obvious library shared by fetch-pack.c ad builtin/index-pack.c, but
there isn't such a thing.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.
We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".
I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.
But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.
So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the gitmodules_{found,done} static variables added in
159e7b080b (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.
This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.
An earlier version of this patch removed the small amount of
duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a
FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth
it for this amount of copy/pasting.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change code I added in acf9de4c94 (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.
At the time that the fsck_set_msg_type() API was introduced in
0282f4dced (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.
For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".
Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].
Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.
1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In f417eed8cd (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1 (fsck: introduce identifiers for fsck
messages, 2015-06-22).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.
This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".
I'm confident that nothing relies on these values, we always compare
them for equality. Let's not omit "0" so it won't be assumed that
we're using these as a boolean somewhere.
This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.
These defines were originally introduced in:
- ba002f3b28 (builtin-fsck: move common object checking code to
fsck.c, 2008-02-25)
- f50c440730 (fsck: disallow demoting grave fsck errors to warnings,
2015-06-22)
- efaba7cc77 (fsck: optionally ignore specific fsck issues
completely, 2015-06-22)
- f27d05b170 (fsck: allow upgrading fsck warnings to errors,
2015-06-22)
The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.
Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor "if options->msg_type" and other code added in
0282f4dced (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.
This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change
This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the append_msg_id() function in favor of calling
prepare_msg_ids(). We already have code to compute the camel-cased
msg_id strings in msg_id_info, let's use it.
When the append_msg_id() function was added in 71ab8fa840 (fsck:
report the ID of the error/warning, 2015-06-22) the prepare_msg_ids()
function didn't exist. When prepare_msg_ids() was added in
a46baac61e (fsck: factor out msg_id_info[] lazy initialization code,
2018-05-26) this code wasn't moved over to lazy initialization.
This changes the behavior of the code to initialize all the messages
instead of just camel-casing the one we need on the fly. Since the
common case is that we're printing just one message this is mostly
redundant work.
But that's OK in this case, reporting this fsck issue to the user
isn't performance-sensitive. If we were somehow doing so in a tight
loop (in a hopelessly broken repository?) this would help, since we'd
save ourselves from re-doing this work for identical messages, we
could just grab the prepared string from msg_id_info after the first
invocation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename variables in a function added in 0282f4dced (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).
It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.
Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.
While I'm at it properly indent the fsck_set_msg_type() argument list.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d531 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).
So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
designated initializers. This allows us to omit those fields that
are initialized to 0 or NULL.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By mistake, the `.exe` extension is appended _twice_ when installing the
dashed executables into `libexec/git-core/` on Windows (the extension is
already appended when adding items to the `git_links` list in the
`#Creating hardlinks` section).
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like the Makefile-based build learned to skip hard-linking the
dashed built-ins in 179227d6e2 (Optionally skip linking/copying the
built-ins, 2020-09-21), this patch teaches the CMake-based build the
same trick.
Note: In contrast to the Makefile-based process, the built-ins would
only be linked during installation, not already when Git is built.
Therefore, the CMake-based build that we use in our CI builds _already_
does not link those built-ins (because the files are not installed
anywhere, they are used to run the test suite in-place).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we fix critical vulnerabilities, we follow some sort of
protocol (e.g. setting a coordinated release date, keeping the fix under
embargo until that time, coordinating with packagers and/or hosting
sites, etc).
Similar in spirit to `Documentation/howto/maintain-git.txt`, let's
formalize the details in a document.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same document, describe that Git does not have Long Term Support
(LTS) release trains, although security fixes are always applied to a
few of the most recent release trains.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sanitizing client-supplied strings on Windows, also strip off
backslashes, not just slashes.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit --fixup=<commit>", which was to tweak the changes made
to the contents while keeping the original log message intact,
learned "--fixup=(amend|reword):<commit>", that can be used to
tweak both the message and the contents, and only the message,
respectively.
* cm/rebase-i-fixup-amend-reword:
doc/git-commit: add documentation for fixup=[amend|reword] options
t3437: use --fixup with options to create amend! commit
t7500: add tests for --fixup=[amend|reword] options
commit: add a reword suboption to --fixup
commit: add amend suboption to --fixup to create amend! commit
sequencer: export and rename subject_length()
Follow-up fixes to "cm/rebase-i" topic.
* cm/rebase-i-updates:
doc/rebase -i: fix typo in the documentation of 'fixup' command
t/t3437: fixup the test 'multiple fixup -c opens editor once'
t/t3437: use named commits in the tests
t/t3437: simplify and document the test helpers
t/t3437: check the author date of fixed up commit
t/t3437: remove the dependency of 'expected-message' file from tests
t/t3437: fixup here-docs in the 'setup' test
t/lib-rebase: update the documentation of FAKE_LINES
rebase -i: clarify and fix 'fixup -c' rebase-todo help
sequencer: rename a few functions
sequencer: fixup the datatype of the 'flag' argument
"rebase -i" is getting cleaned up and also enhanced.
* cm/rebase-i:
doc/git-rebase: add documentation for fixup [-C|-c] options
rebase -i: teach --autosquash to work with amend!
t3437: test script for fixup [-C|-c] options in interactive rebase
rebase -i: add fixup [-C | -c] command
sequencer: use const variable for commit message comments
sequencer: pass todo_item to do_pick_commit()
rebase -i: comment out squash!/fixup! subjects from squash message
sequencer: factor out code to append squash message
rebase -i: only write fixup-message when it's needed
The http codepath learned to let the credential layer to cache the
password used to unlock a certificate that has successfully been
used.
* js/http-pki-credential-store:
http: drop the check for an empty proxy password before approving
http: store credential when PKI auth is used
Reorganize Makefile to allow building git.o and other essential
objects without extra stuff needed only for testing.
* ab/make-cleanup:
Makefile: add {program,xdiff,test,git,fuzz}-objs & objects targets
Makefile: split OBJECTS into OBJECTS and GIT_OBJS
Makefile: sort OBJECTS assignment for subsequent change
Makefile: split up long OBJECTS line
Makefile: guard against TEST_OBJS in the environment
The hashwrite() method takes an input buffer and updates a hashfile's
hash function while writing the data to a file. To avoid overuse of
flushes, the hashfile has an internal buffer and most writes will use
memcpy() to transfer data from the input 'buf' to the hashfile's buffer
of size 8 * 1024 bytes.
Logic introduced by a8032d12 (sha1write: don't copy full sized buffers,
2008-09-02) reduces the number of memcpy() calls when the input buffer
is sufficiently longer than the hashfile's buffer, causing nr to be the
length of the full buffer. In these cases, the input buffer is used
directly in chunks equal to the hashfile's buffer size.
This method caught my attention while investigating some performance
issues, but it turns out that these performance issues were noise within
the variance of the experiment.
However, during this investigation, I inspected hashwrite() and
misunderstood it, even after looking closely and trying to make it
faster. This change simply reorganizes some parts of the loop within
hashwrite() to make it clear that each batch either uses memcpy() to the
hashfile's buffer or writes directly from the input buffer. The previous
code relied on indirection through local variables and essentially
inlined the implementation of hashflush() to reduce lines of code.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff-index" codepath has been taught to trust fsmonitor status
to reduce number of lstat() calls.
* nk/diff-index-fsmonitor:
fsmonitor: add perf test for git diff HEAD
fsmonitor: add assertion that fsmonitor is valid to check_removed
fsmonitor: skip lstat deletion check during git diff-index
GIT_TEST_FAIL_PREREQS is a mechanism to skip test pieces with
prerequisites to catch broken tests that depend on the side effects
of optional pieces, but did not work at all when negative
prerequisites were involved.
* jk/fail-prereq-testfix:
t: annotate !PTHREADS tests with !FAIL_PREREQS
"git repack" so far has been only capable of repacking everything
under the sun into a single pack (or split by size). A cleverer
strategy to reduce the cost of repacking a repository has been
introduced.
* tb/geometric-repack:
builtin/pack-objects.c: ignore missing links with --stdin-packs
builtin/repack.c: reword comment around pack-objects flags
builtin/repack.c: be more conservative with unsigned overflows
builtin/repack.c: assign pack split later
t7703: test --geometric repack with loose objects
builtin/repack.c: do not repack single packs with --geometric
builtin/repack.c: add '--geometric' option
packfile: add kept-pack cache for find_kept_pack_entry()
builtin/pack-objects.c: rewrite honor-pack-keep logic
p5303: measure time to repack with keep
p5303: add missing &&-chains
builtin/pack-objects.c: add '--stdin-packs' option
revision: learn '--no-kept-objects'
packfile: introduce 'find_kept_pack_entry()'
As record_reused_object(offset, offset - hashfile_total(out)) said,
reused_chunk.difference should be the offset of original packfile minus
the offset of the generated packfile. But the comment presented an opposite way.
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The text says something called a "patch" is prepared one for each
commit, it is suitable for e-mail submission, and "am" is the
command to use it, but does not say what the "patch" really is.
The description in the page also refers to the "three-dash" line,
but it is unclear what it is, unless the reader is given a more
detailed overview of what the "patch" is.
Add a brief paragraph to give an overview of what the output looks
like.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The completion for 'git stash' has not changed in a major way since it
was converted from shell script to builtin. Now that it's a builtin, we
can take advantage of the groundwork laid out by parse-options and use
the generated options.
Rewrite _git_stash() to take use __gitcomp_builtin() to generate
completions for subcommands.
The main `git stash` command does not take any arguments directly. If no
subcommand is given, it automatically defaults to `git stash push`. This
means that we can simplify the logic for when no subcommands have been
given yet. We only have to offer subcommand completions when we're
completing a non-option after "stash".
One area that this patch could improve upon is that the `git stash list`
command accepts log-options. It would be nice if the completion for this
were unified with that of _git_log() and _git_show() which would allow
completions to be provided for options such as `--pretty` but that is
outside the scope of this patch.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To save a level of indentation, perform an early return in the "if" arm
so we can move the "else" code out of the block.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many completion functions perform hardcoded comparisons with $cword.
This fails in the case where the main git command is given arguments
(e.g. `git -C . bundle<TAB>` would fail to complete its subcommands).
Even _git_worktree(), which uses __git_find_on_cmdline(), could still
fail. With something like `git -C add worktree move<TAB>`, the
subcommand would be incorrectly identified as "add" instead of "move".
Assign $__git_subcommand_idx in __git_main(), where the git subcommand
is actually found and the corresponding completion function is called.
Use this variable to replace hardcoded comparisons with $cword.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a stray "xb" I inadvertently introduced in 780aa0a21e (tests:
remove last uses of GIT_TEST_GETTEXT_POISON=false, 2021-02-11).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get-send-email currently makes the assumption that the
'sendemail-validate' hook exists inside of the repository.
Since the introduction of 'core.hooksPath' configuration option in
867ad08a26 (hooks: allow customizing where the hook directory is,
2016-05-04), this is no longer true.
Instead of assuming a hardcoded repo relative path, query
git for the actual path of the hooks directory.
Signed-off-by: Robert Foss <robert.foss@linaro.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the rebase.useBuiltin setting and the now-obsolete
GIT_TEST_REBASE_USE_BUILTIN test flag.
This was left in place after my d03ebd411c (rebase: remove the
rebase.useBuiltin setting, 2019-03-18) to help anyone who'd used the
experimental flag and wanted to know that it was the default, or that
they should transition their test environment to use the builtin
rebase unconditionally.
It's been more than long enough for those users to get a headsup about
this. So remove all the scaffolding that was left inplace after
d03ebd411c. I'm also removing the documentation entry, if anyone
still has this left in their configuration they can do some source
archaeology to figure out what it used to do, which makes more sense
than exposing every git user reading the documentation to this legacy
configuration switch.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `-v<n>` option of `format-patch` can give nothing but an
integral iteration number to patches in a series. Some people,
however, prefer to mark a new iteration with only a small fixup
with a non integral iteration number (e.g. an "oops, that was
wrong" fix-up patch for v4 iteration may be labeled as "v4.1").
Allow `format-patch` to take such a non-integral iteration
number.
`<n>` can be any string, such as '3.1' or '4rev2'. In the case
where it is a non-integral value, the "Range-diff" and "Interdiff"
headers will not include the previous version.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that updates the in-memory index information after an entry is
written currently resides in write_entry(). Extract it to a public
function so that it can be called by the parallel checkout functions,
outside entry.c, in a later patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header. While at it let's also move a comment related to
checkout_entry() from entry.c to entry.h as it's more useful to describe
the function there.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically, Git has supported the 'Signed-off-by' commit trailer
using the '--signoff' and the '-s' option from the command line.
But users may need to provide other trailer information from the
command line such as "Helped-by", "Reported-by", "Mentored-by",
Now implement a new `--trailer <token>[(=|:)<value>]` option to pass
other trailers to `interpret-trailers` and insert them into commit
messages.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git -c core.bare=false clone --bare ..." would have segfaulted,
which has been corrected.
* bc/clone-bare-with-conflicting-config:
builtin/init-db: handle bare clones when core.bare set to false
"git log --format='...'" learned "%(describe)" placeholder.
* rs/pretty-describe:
archive: expand only a single %(describe) per archive
pretty: document multiple %(describe) being inconsistent
t4205: assert %(describe) test coverage
pretty: add merge and exclude options to %(describe)
pretty: add %(describe)
"git stash show" learned to optionally show untracked part of the
stash.
* dl/stash-show-untracked:
stash show: learn stash.showIncludeUntracked
stash show: teach --include-untracked and --only-untracked
Rename detection rework continues.
* en/ort-perf-batch-8:
diffcore-rename: compute dir_rename_guess from dir_rename_counts
diffcore-rename: limit dir_rename_counts computation to relevant dirs
diffcore-rename: compute dir_rename_counts in stages
diffcore-rename: extend cleanup_dir_rename_info()
diffcore-rename: move dir_rename_counts into dir_rename_info struct
diffcore-rename: add function for clearing dir_rename_count
Move computation of dir_rename_count from merge-ort to diffcore-rename
diffcore-rename: add a mapping of destination names to their indices
diffcore-rename: provide basic implementation of idx_possible_rename()
diffcore-rename: use directory rename guided basename comparisons
Updates to memory allocation code around the use of pcre2 library.
* ab/grep-pcre2-allocfix:
grep/pcre2: move definitions of pcre2_{malloc,free}
grep/pcre2: move back to thread-only PCREv2 structures
grep/pcre2: actually make pcre2 use custom allocator
grep/pcre2: use pcre2_maketables_free() function
grep/pcre2: use compile-time PCREv2 version test
grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
grep/pcre2: prepare to add debugging to pcre2_malloc()
grep/pcre2: correct reference to grep_init() in comment
grep/pcre2: drop needless assignment to NULL
grep/pcre2: drop needless assignment + assert() on opt->pcre2
Perf test update to work better in secondary worktrees.
* jk/perf-in-worktrees:
t/perf: avoid copying worktree files from test repo
t/perf: handle worktrees as test repos
A new configuration variable has been introduced to allow choosing
which version of the generation number gets used in the
commit-graph file.
* ds/commit-graph-generation-config:
commit-graph: use config to specify generation type
commit-graph: create local repository pointer
Update C code that sets a few configuration variables when a remote
is configured so that it spells configuration variable names in the
canonical camelCase.
* ab/remote-write-config-in-camel-case:
remote: write camel-cased *.pushRemote on rename
remote: add camel-cased *.tagOpt key, like clone
We had a code to diagnose and die cleanly when a required
clean/smudge filter is missing, but an assert before that
unnecessarily fired, hiding the end-user facing die() message.
* mt/cleanly-die-upon-missing-required-filter:
convert: fail gracefully upon missing clean cmd on required filter
It does not make sense to make ".gitattributes", ".gitignore" and
".mailmap" symlinks, as they are supposed to be usable from the
object store (think: bare repositories where HEAD:.mailmap etc. are
used). When these files are symbolic links, we used to read the
contents of the files pointed by them by mistake, which has been
corrected.
* jk/open-dotgitx-with-nofollow:
mailmap: do not respect symlinks for in-tree .mailmap
exclude: do not respect symlinks for in-tree .gitignore
attr: do not respect symlinks for in-tree .gitattributes
exclude: add flags parameter to add_patterns()
attr: convert "macro_ok" into a flags field
add open_nofollow() helper
When "git diff --no-index X Y" is run the modes of the files being
differ are normalized by canon_mode() in fill_filespec().
I recently broke that behavior in a patch of mine[1] which would pass
all tests, or not, depending on the umask of the git.git checkout.
Let's test for this explicitly. Arguably this should not be the
behavior of "git diff --no-index". We aren't diffing our own objects
or the index, so it might be useful to show mode differences between
files.
On the other hand diff(1) does not do that, and it would be needlessly
distracting when e.g. diffing an extracted tar archive whose contents
is the same, but whose file modes are different.
1. https://lore.kernel.org/git/20210316155829.31242-2-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing the bitmap walk, we first establish the set of of have
and want objects by iterating over the set of pending objects: if an
object is marked as uninteresting, it's declared as an object we already
have, otherwise as an object we want. These two sets are then used to
compute which transitively referenced objects we need to obtain.
One special case here are tag objects: when a tag is requested, we
resolve it to its first not-tag object and add both resolved objects as
well as the tag itself into either the have or want set. Given that the
uninteresting-property always propagates to referenced objects, it is
clear that if the tag is uninteresting, so are its children and vice
versa. But we fail to propagate the flag, which effectively means that
referenced objects will always be interesting except for the case where
they have already been marked as uninteresting explicitly.
This mislabeling does not impact correctness: we now have it in our
"wants" set, and given that we later do an `AND NOT` of the bitmaps of
"wants" and "haves" sets it is clear that the result must be the same.
But we now start to needlessly traverse the tag's referenced objects in
case it is uninteresting, even though we know that each referenced
object will be uninteresting anyway. In the worst case, this can lead to
a complete graph walk just to establish that we do not care for any
object.
Fix the issue by propagating the `UNINTERESTING` flag to pointees of tag
objects and add a benchmark with negative revisions to p5310. This shows
some nice performance benefits, tested with linux.git:
Test HEAD~ HEAD
---------------------------------------------------------------------------------------------------------------
5310.3: repack to disk 193.18(181.46+16.42) 194.61(183.41+15.83) +0.7%
5310.4: simulated clone 25.93(24.88+1.05) 25.81(24.73+1.08) -0.5%
5310.5: simulated fetch 2.64(5.30+0.69) 2.59(5.16+0.65) -1.9%
5310.6: pack to file (bitmap) 58.75(57.56+6.30) 58.29(57.61+5.73) -0.8%
5310.7: rev-list (commits) 1.45(1.18+0.26) 1.46(1.22+0.24) +0.7%
5310.8: rev-list (objects) 15.35(14.22+1.13) 15.30(14.23+1.07) -0.3%
5310.9: rev-list with tag negated via --not --all (objects) 22.49(20.93+1.56) 0.11(0.09+0.01) -99.5%
5310.10: rev-list with negative tag (objects) 0.61(0.44+0.16) 0.51(0.35+0.16) -16.4%
5310.11: rev-list count with blob:none 12.15(11.19+0.96) 12.18(11.19+0.99) +0.2%
5310.12: rev-list count with blob:limit=1k 17.77(15.71+2.06) 17.75(15.63+2.12) -0.1%
5310.13: rev-list count with tree:0 1.69(1.31+0.38) 1.68(1.28+0.39) -0.6%
5310.14: simulated partial clone 20.14(19.15+0.98) 19.98(18.93+1.05) -0.8%
5310.16: clone (partial bitmap) 12.78(13.89+1.07) 12.72(13.99+1.01) -0.5%
5310.17: pack to file (partial bitmap) 42.07(45.44+2.72) 41.44(44.66+2.80) -1.5%
5310.18: rev-list with tree filter (partial bitmap) 0.44(0.29+0.15) 0.46(0.32+0.14) +4.5%
While most benchmarks are probably in the range of noise, the newly
added 5310.9 and 5310.10 benchmarks consistenly perform better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the username and password are supplied in a url like this
https://myuser:secret@git.exampe/myrepo.git and the server supports the
negotiate authenticaten method, git does not fall back to basic auth and
libcurl hardly tries to authenticate with the negotiate method.
Stop using the Negotiate authentication method after the first failure
because if it fails on the first try it will never succeed.
Signed-off-by: Christopher Schenk <christopher@cschenk.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create t0052-simple-ipc.sh with unit tests for the "simple-ipc" mechanism.
Create t/helper/test-simple-ipc test tool to exercise the "simple-ipc"
functions.
When the tool is invoked with "run-daemon", it runs a server to listen
for "simple-ipc" connections on a test socket or named pipe and
responds to a set of commands to exercise/stress the communication
setup.
When the tool is invoked with "start-daemon", it spawns a "run-daemon"
command in the background and waits for the server to become ready
before exiting. (This helps make unit tests in t0052 more predictable
and avoids the need for arbitrary sleeps in the test script.)
The tool also has a series of client "send" commands to send commands
and data to a server instance.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create Unix domain socket based implementation of "simple-ipc".
A set of `ipc_client` routines implement a client library to connect
to an `ipc_server` over a Unix domain socket, send a simple request,
and receive a single response. Clients use blocking IO on the socket.
A set of `ipc_server` routines implement a thread pool to listen for
and concurrently service client connections.
The server creates a new Unix domain socket at a known location. If a
socket already exists with that name, the server tries to determine if
another server is already listening on the socket or if the socket is
dead. If socket is busy, the server exits with an error rather than
stealing the socket. If the socket is dead, the server creates a new
one and starts up.
If while running, the server detects that its socket has been stolen
by another server, it automatically exits.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for --exit-code working with --no-index. There's no reason
to suppose it wouldn't, but we weren't testing for it anywhere in our
tests. Let's fix that blind spot.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
transport_get_remote_refs() can populate the transport struct's
remote_refs. transport_disconnect() is already responsible for most of
transport's cleanup - therefore we also take care of freeing remote_refs
there.
There are 2 locations where transport_disconnect() is called before
we're done using the returned remote_refs. This patch changes those
callsites to only call transport_disconnect() after the returned refs
are no longer being used - which is necessary to safely be able to
free remote_refs during transport_disconnect().
This commit fixes the following leak which was found while running
t0000, but is expected to also fix the same pattern of leak in all
locations that use transport_get_remote_refs():
Direct leak of 165 byte(s) in 1 object(s) allocated from:
#0 0x49a6b2 in calloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x9a72f2 in xcalloc /home/ahunt/oss-fuzz/git/wrapper.c:140:8
#2 0x8ce203 in alloc_ref_with_prefix /home/ahunt/oss-fuzz/git/remote.c:867:20
#3 0x8ce1a2 in alloc_ref /home/ahunt/oss-fuzz/git/remote.c:875:9
#4 0x72f63e in process_ref_v2 /home/ahunt/oss-fuzz/git/connect.c:426:8
#5 0x72f21a in get_remote_refs /home/ahunt/oss-fuzz/git/connect.c:525:8
#6 0x979ab7 in handshake /home/ahunt/oss-fuzz/git/transport.c:305:4
#7 0x97872d in get_refs_via_connect /home/ahunt/oss-fuzz/git/transport.c:339:9
#8 0x9774b5 in transport_get_remote_refs /home/ahunt/oss-fuzz/git/transport.c:1388:4
#9 0x51cf80 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1271:9
#10 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#11 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#12 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#13 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#14 0x69c45e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#15 0x7f6a459d5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
preprocess_options() allocates new strings for help messages for
OPTION_ALIAS. Therefore we also need to clean those help messages up
when freeing the returned options.
First introduced in:
7c280589cf (parse-options: teach "git cmd -h" to show alias as alias, 2020-03-16)
The preprocessed options themselves no longer contain any indication
that a given option is/was an alias - therefore we add a new flag to
indicate former aliases. (An alternative approach would be to look back
at the original options to determine which options are aliases - but
that seems like a fragile approach. Or we could even look at the
alias_groups list - which might be less fragile, but would be slower
as it requires nested looping.)
As far as I can tell, parse_options() is only ever used once per
command, and the help messages are small - hence this leak has very
little impact.
This leak was found while running t0001. LSAN output can be found below:
Direct leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9aae36 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8
#2 0x939d8d in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2
#3 0x93b936 in strbuf_vaddf /home/ahunt/oss-fuzz/git/strbuf.c:392:3
#4 0x93b7ff in strbuf_addf /home/ahunt/oss-fuzz/git/strbuf.c:333:2
#5 0x86747e in preprocess_options /home/ahunt/oss-fuzz/git/parse-options.c:666:3
#6 0x866ed2 in parse_options /home/ahunt/oss-fuzz/git/parse-options.c:847:17
#7 0x51c4a7 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:989:9
#8 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#9 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#10 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#11 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#12 0x69c9fe in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#13 0x7fdac42d4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because it's easier to read, but also likely to be easier to maintain.
I am making this change because I need to add a new flag in a later
commit.
Also add a trailing comma to the last enum entry to simplify addition of
new flags.
This change was originally suggested by Peff in:
https://public-inbox.org/git/YEZ%2FBWWbpfVwl6nO@coredump.intra.peff.net/
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the signature of read_tree_recursive() to omit the "base",
"baselen" and "stage" arguments. No callers of it use these parameters
for anything anymore.
The last function to call read_tree_recursive() with a non-"" path was
read_tree_recursive() itself, but that was changed in
ffd31f661d (Reimplement read_tree_recursive() using
tree_entry_interesting(), 2011-03-25).
The last user of the "stage" parameter went away in the last commit,
and even that use was mere boilerplate.
So let's remove those and rename the read_tree_recursive() function to
just read_tree(). We had another read_tree() function that I've
refactored away in preceding commits, since all in-tree users read
trees recursively with a callback we can change the name to signify
that this is the norm.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the static read_tree_1() function to read_tree_at(). This
function works just like read_tree_recursive(), except you provide
your own strbuf.
This step doesn't make much sense now, but in follow-up commits I'll
remove the base/baselen/stage arguments to read_tree_recursive(). At
that point an anticipated in-tree user[1] for the old
read_tree_recursive() couldn't provide a path to start the
traversal.
Let's give them a function to do so with an API that makes more sense
for them, by taking a strbuf we should be able to avoid more casting
and/or reallocations in the future.
1. https://lore.kernel.org/git/xmqqft106sok.fsf@gitster.g
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "stage" variable being passed around in the archive code has only
ever been an elaborate way to hardcode the value "0".
This code was added in its original form in e4fbbfe9ec (Add
git-zip-tree, 2006-08-26), at which point a hardcoded "0" would be
passed down through read_tree_recursive() to write_zip_entry().
It was then diligently added to the "struct directory" in
ed22b4173b (archive: support filtering paths with glob, 2014-09-21),
but we were still not doing anything except passing it around as-is.
Let's stop doing that in the code internal to archive.c, we'll still
feed "0" to read_tree_recursive() itself, but won't use it. That we're
providing it at all to read_tree_recursive() will be changed in a
follow-up commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor away the read_tree() function into its only user,
overlay_tree_on_index().
First, change read_one_entry_opt() to use the strbuf parameter
read_tree_recursive() passes down in place. This finishes up a partial
refactoring started in 6a0b0b6de9 (tree.c: update read_tree_recursive
callback to pass strbuf as base, 2014-11-30).
Moving the rest into overlay_tree_on_index() makes this index juggling
we're doing easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that read_tree() has been moved to ls-files.c we can get rid of
the stage != 1 case that'll never happen.
Let's not use read_tree_recursive() as a pass-through to pass "stage =
1" either. For now we'll pass an unused "stage = 0" for consistency
with other read_tree_recursive() callers, that argument will be
removed in a follow-up commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the read_tree() API was added around the same time as
read_tree_recursive() in 94537c78a8 (Move "read_tree()" to
"tree.c"[...], 2005-04-22) and b12ec373b8 ([PATCH] Teach read-tree
about commit objects, 2005-04-20) things have gradually migrated over
to the read_tree_recursive() version.
Now builtin/ls-files.c is the last user of this code, let's move all
the relevant code there. This allows for subsequent simplification of
it, and an eventual move to read_tree_recursive().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for "ls-files --with-tree". There was effectively no
coverage for any normal usage of this command, only the tests added in
54e1abce90 (Add test case for ls-files --with-tree, 2007-10-03) for
an obscure bug.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing tests for showing a tree with "git show". Let's test for
showing a tree, two trees, and that doing so doesn't recurse.
The only tests for this code added in 5d7eeee2ac (git-show: grok
blobs, trees and tags, too, 2006-12-14) were the tests in
t7701-repack-unpack-unreachable.sh added in ccc1297226 (repack:
modify behavior of -A option to leave unreferenced objects unpacked,
2008-05-09).
Let's add this common mode of operation to the "show" tests
themselves. It's more obvious, and the tests in
t7701-repack-unpack-unreachable.sh happily pass if we start buggily
emitting trees recursively.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for switching from merge-recursive to merge-ort as the
default strategy, have the testsuite default to running with merge-ort.
Keep coverage of the recursive backend by having the linux-gcc job run
with it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we started on merge-ort, thousands of tests failed when run with
the GIT_TEST_MERGE_ALGORITHM=ort flag; with so many, it didn't make
sense to flip all their test expectations. The ones in t6409, t6418,
and the submodule tests are being handled by an independent in-flight
topic ("Complete merge-ort implemenation...almost"). The ones in
t6423 were left out of the other series because other ongoing series
that this commit depends upon were addressing those. Now that we only
have one remaining test failure in t6423, let's mark it as such.
This remaining test will be fixed by a future optimization series, but
since merge-recursive doesn't pass this test either, passing it is not
necessary for declaring merge-ort ready for general use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 5ced7c3da0, which was
put in place as a temporary measure to avoid optimizations unstably
erroring out on no destination having a majority of the necessary
renames for directories that had no new files and thus no need for
directory rename detection anyway. Now that optimizations are in place
to prevent us from trying to compute directory rename count computations
for directories that do not need it, we can undo this temporary measure.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plan is to just delete merge-recursive, but not until everyone is
comfortable with merge-ort as a replacement. Given that I haven't
switched all callers of merge-recursive over yet (e.g. git-am still uses
merge-recursive), maybe there's some value documenting known bugs in the
algorithm in case we end up keeping it or someone wants to dig it up in
the future.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a variety of questions users might ask while resolving
conflicts:
* What changes have been made since the previous (first) parent?
* What changes are staged?
* What is still unstaged? (or what is still conflicted?)
* What changes did I make to resolve conflicts so far?
The first three of these have simple answers:
* git diff HEAD
* git diff --cached
* git diff
There was no way to answer the final question previously. Adding one
is trivial in merge-ort, since it works by creating a tree representing
what should be written to the working copy complete with conflict
markers. Simply write that tree to .git/AUTO_MERGE, allowing users to
answer the fourth question with
* git diff AUTO_MERGE
I avoided using a name like "MERGE_AUTO", because that would be
merge-specific (much like MERGE_HEAD, REBASE_HEAD, REVERT_HEAD,
CHERRY_PICK_HEAD) and I wanted a name that didn't change depending on
which type of operation the merge was part of.
Ensure that paths which clean out other temporary operation-specific
files (e.g. CHERRY_PICK_HEAD, MERGE_MSG, rebase-merge/ state directory)
also clean out this AUTO_MERGE file.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort handles submodules (and directory/file conflicts in general)
differently than merge-recursive does; it basically puts all the special
handling for different filetypes into one place in the codebase instead
of needing special handling for different filetypes in many different
code paths. This one code path in merge-ort could perhaps use some work
still (there are still test_expect_failure cases in the testsuite), but
it passes all the tests that merge-recursive does as well as 12
additional ones that merge-recursive fails. Mark those 12 tests as
test_expect_success under merge-ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merge conflicts occur in paths removed by a sparse-checkout, we
need to unsparsify those paths (clear the SKIP_WORKTREE bit), and write
out the conflicted file to the working copy. In the very unlikely case
that someone manually put a file into the working copy at the location
of the SKIP_WORKTREE file, we need to avoid overwriting whatever edits
they have made and move that file to a different location first.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is a conflict during a merge for a SKIP_WORKTREE entry, we
expect that file to be written to the working copy and have the
SKIP_WORKTREE bit cleared in the index. If the user had manually
created a file in the working tree despite SKIP_WORKTREE being set, we
do not want to clobber their changes to that file, but want to move it
out of the way. Add tests that check for these behaviors.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive has some simple code to support subtree shifting; copy
it over to merge-ort. This fixes t6409.12 under
GIT_TEST_MERGE_ALGORITHM=ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a modify/delete conflict, but the only change to the
modification is e.g. change of line endings, then if renormalization is
requested then we should be able to recognize such a case as a
not-modified/delete and resolve the conflict automatically.
This fixes t6418.10 under GIT_TEST_MERGE_ALGORITHM=ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ll_merge() needs an index when renormalization is requested. Create one
specifically for just this purpose with just the one needed entry. This
fixes t6418.4 and t6418.5 under GIT_TEST_MERGE_ALGORITHM=ort.
NOTE 1: Even if the user has a working copy or a real index (which is
not a given as merge-ort can be used in bare repositories), we
explicitly ignore any .gitattributes file from either of these
locations. merge-ort can be used to merge two branches that are
unrelated to HEAD, so .gitattributes from the working copy and current
index should not be considered relevant.
NOTE 2: Since we are in the middle of merging, there is a risk that
.gitattributes itself is conflicted...leaving us with an ill-defined
situation about how to perform the rest of the merge. It could be that
the .gitattributes file does not even exist on one of the sides of the
merge, or that it has been modified on both sides. If it's been
modified on both sides, it's possible that it could itself be merged
cleanly, though it's also possible that it only merges cleanly if you
use the right version of the .gitattributes file to drive the merge. It
gets kind of complicated. The only test we ever had that attempted to
test behavior in this area was seemingly unaware of the undefined
behavior, but knew the test wouldn't work for lack of attribute handling
support, marked it as test_expect_failure from the beginning, but
managed to fail for several reasons unrelated to attribute handling.
See commit 6f6e7cfb52 ("t6038: remove problematic test", 2020-08-03) for
details. So there are probably various ways to improve what
initialize_attr_index() picks in the case of a conflicted .gitattributes
but for now I just implemented something simple -- look for whatever
.gitattributes file we can find in any of the higher order stages and
use it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
renormalize_buffer() requires an index_state, which is something that
merge-ort does not operate with. However, all the renormalization code
needs is an index with a .gitattributes file...plus a little bit of
setup. Create such an index, along with the deallocation and
attr_direction handling.
A subsequent commit will add a function to finish the initialization
of this index.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rename/rename conflict handling depends on the fact that if both sides
renamed the same path, that the one on the MERGE_SIDE1 will appear first
in the combined diff_queue_struct passed to process_renames(). Since we
add all pairs from MERGE_SIDE1 to combined first, and then all pairs
from MERGE_SIDE2, and then sort based on filename, this will only be
true if the sort used is stable. This was found due to the fact that
Mac, unlike Linux, apparently has a system-defined qsort that is not
stable.
While we are at it, review the other callers of QSORT and add comments
about why they can remain as calls to QSORT instead of being modified
to call STABLE_QSORT.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git pack-objects --stdin-packs' encounters a commit in a pack, it
marks it as a starting point of a best-effort reachability traversal
that is used to populate the name-hash of the objects listed in the
given packs.
The traversal expects that it should be able to walk the ancestors of
all commits in a pack without issue. Ordinarily this is the case, but it
is possible to having missing parents from an unreachable part of the
repository. In that case, we'd consider any missing objects in the
unreachable portion of the graph to be junk.
This should be handled gracefully: since the traversal is best-effort
(i.e., we don't strictly need to fill in all of the name-hash fields),
we should simply ignore any missing links.
This patch does that (by setting the 'ignore_missing_links' bit on the
rev_info struct), and ensures we don't regress in the future by adding a
test which demonstrates this case.
It is a little over-eager, since it will also ignore missing links in
reachable parts of the packs (which would indicate a corrupted
repository), but '--stdin-packs' is explicitly *not* about reachability.
So this step isn't making anything worse for a repository which contains
packs missing reachable objects (since we never drop objects with
'--stdin-packs').
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Note on using Asciidoctor to build documentation suite.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted a few commits ago ("diffcore-rename: only compute
dir_rename_count for relevant directories"), when a source file rename
is used as part of directory rename detection, we need to increment
counts for each ancestor directory in dirs_removed with value
RELEVANT_FOR_SELF. However, a few commits ago ("diffcore-rename: check
if we have enough renames for directories early on"), we may have
downgraded all relevant ancestor directories from RELEVANT_FOR_SELF to
RELEVANT_FOR_ANCESTOR.
For a given file, if no ancestor directory is found in dirs_removed with
a value of RELEVANT_FOR_SELF, then we can downgrade
relevant_source[PATH] from RELEVANT_LOCATION to RELEVANT_NO_MORE. This
means we can skip detecting a rename for that particular path (and any
other paths in the same directory).
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.680 s ± 0.096 s 5.665 s ± 0.129 s
mega-renames: 13.812 s ± 0.162 s 11.435 s ± 0.158 s
just-one-mega: 506.0 ms ± 3.9 ms 494.2 ms ± 6.1 ms
While this improvement looks rather modest for these testcases (because
all the previous optimizations were sufficient to nearly remove all time
spent in rename detection already), consider this alternative testcase
tweaked from the ones in commit 557ac0350d as follows
<Same initial setup as commit 557ac0350d, then...>
$ git switch -c add-empty-file v5.5
$ >drivers/gpu/drm/i915/new-empty-file
$ git add drivers/gpu/drm/i915/new-empty-file
$ git commit -m "new file"
$ git switch 5.4-rename
$ git cherry-pick --strategy=ort add-empty-file
For this testcase, we see the following improvement:
Before After
pick-empty: 1.936 s ± 0.024 s 688.1 ms ± 4.2 ms
So roughly a factor of 3 speedup. At $DAYJOB, there was a particular
repository and cherry-pick that inspired this optimization; for that
case I saw a speedup factor of 7 with this optimization.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different reasons we might want a rename for a file -- for
three-way content merging or as part of directory rename detection.
Record the reason. diffcore-rename will potentially be able to filter
some of the ones marked as needed only for directory rename detection,
if it can determine those directory renames based solely on renames
found via exact rename detection and basename-guided rename detection.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit can only be effective if we have a computation of
the number of paths under a given directory which are still have pending
renames, and expected this number to be recorded in the dir_rename_count
map under the key UNKNOWN_DIR. Add the code necessary to compute these
values.
Note that this change means dir_rename_count might have a directory
whose only entry (for UNKNOWN_DIR) was removed by the time merge-ort
goes to check it. To account for this, merge-ort needs to check for the
case where the max count is 0.
With this change we are now computing the necessary value for each
directory in dirs_removed, but are not using that value anywhere. The
next two commits will make use of the values stored in dirs_removed in
order to compute whether each relevant_source (that is needed only for
directory rename detection) has become unnecessary.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in the past few commits, if we can determine that a directory
already has enough renames to determine how directory rename detection
will be decided for that directory, then we can mark that directory as
no longer needing any more renames detected for files underneath it.
For such directories, we change the value in the dirs_removed map from
RELEVANT_TO_SELF to RELEVANT_FOR_ANCESTOR.
A subsequent patch will use this information while iterating over the
remaining potential rename sources to mark ones that were only
location_relevant as unneeded if no containing directory is still marked
as RELEVANT_TO_SELF.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When one side adds files to a directory that the other side renamed,
directory rename detection is used to either move the new paths to the
newer directory or warn the user about the fact that another path
location might be better.
If a parent of the given directory had new files added to it, any
renames in the current directory are also part of determining where the
parent directory is renamed to. Thus, naively, we need to record each
rename N times for a path at depth N. However, we can use the
additional information added to dirs_removed in the last commit to avoid
traversing all N parent directories in many cases. Let's use an example
to explain how this works. If we have a path named
src/old_dir/a/b/file.c
and src/old_dir doesn't exist on one side of history, but the other
added a file named src/old_dir/newfile.c, then if one side renamed
src/old_dir/a/b/file.c => source/new_dir/a/b/file.c
then this file would affect potential directory rename detection counts
for
src/old_dir/a/b => source/new_dir/a/b
src/old_dir/a => source/new_dir/a
src/old_dir => source/new_dir
src => source
adding a weight of 1 to each in dir_rename_counts. However, if src/
exists on both sides of history, then we don't need to track any entries
for it in dir_rename_counts. That was implemented previously. What we
are adding now, is that if no new files were added to src/old_dir/a or
src/old_dir/b, then we don't need to have counts in dir_rename_count
for those directories either.
In short, we only need to track counts in dir_rename_count for
directories whose dirs_removed value is RELEVANT_FOR_SELF. And as soon
as we reach a directory that isn't in dirs_removed (signalled by
returning the default value of NOT_RELEVANT from strintmap_get()), we
can stop looking any further up the directory hierarchy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When one side of history renames a directory, and the other side of
history added files to the old directory, directory rename detection is
used to warn about the location of the added files so the user can
move them to the old directory or keep them with the new one.
This sets up three different types of directories:
* directories that had new files added to them
* directories underneath a directory that had new files added to them
* directories where no new files were added to it or any leading path
Save this information in dirs_removed; the next several commits will
make use of this information.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in the previous commit, we want to be able to take advantage of
the "majority rules" portion of directory rename detection to avoid
detecting more renames than necessary. However, for diffcore-rename to
take advantage of that, it needs to know whether a rename source file
was needed for just directory rename detection reasons, or if it is
wanted for potential three-way content merging. Modify relevant_sources
from a strset to a strintmap, so we can encode additional information.
We also modify dirs_removed from a strset to a strintmap at the same
time because trying to determine what files are needed for directory
rename detection will require us tracking a bit more information for
each directory.
This commit only changes the types of the two variables from strset to
strintmap; it does not actually store any special values yet and for now
only checks for presence of entries in the strintmap. Thus, the code is
functionally identical to how it behaved before. Future commits will
start associating values with each key for these two maps.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In directory rename detection (when a directory is removed on one side
of history and the other side adds new files to that directory), we work
to find where the greatest number of files within that directory were
renamed to so that the new files can be moved with the majority of the
files.
Naively, we can just do this by detecting renames for *all* files within
the removed/renamed directory, looking at all the destination
directories where files within that directory were moved, and if there
is more than one such directory then taking the one with the greatest
number of files as the directory where the old directory was renamed to.
However, sometimes there are enough renames from exact rename detection
or basename-guided rename detection that we have enough information to
determine the majority winner already. Add a function meant to compute
whether particular renames are still needed based on this majority rules
check. The next several commits will then add the necessary
infrastructure to get the information we need to compute which
additional rename sources we can skip.
An important side note for future further optimization:
There is a possible improvement to this optimization that I have not yet
attempted and will not be included in this series of patches: we could
first check whether exact renames provide enough information for us to
determine directory renames, and avoid doing basename-guided rename
detection on some or all of the RELEVANT_LOCATION files within those
directories. In effect, this variant would mean doing the
handle_early_known_dir_renames() both after exact rename detection and
again after basename-guided rename detection, though it would also mean
decrementing the number of "unknown" renames for each rename we found
from basename-guided rename detection. Adding this additional check for
skippable renames right after exact rename detection might turn out to
be valuable, especially for partial clones where it might allow us to
download certain source files entirely. However, this particular
optimization was actually the last one I did in original implementation
order, and by the time I implemented this idea, every testcase I had was
sufficiently fast that further optimization was unwarranted. If future
testcases arise that tax rename detection more heavily (or perhaps
partial clones can benefit from avoiding loading more objects), it may
be worth implementing this more involved variant.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests in t5300 and t7810 expect us to complain about a "--threads"
argument when Git is compiled without pthread support. Running these
under GIT_TEST_FAIL_PREREQS produces a confusing failure: we pretend to
the tests that there is no pthread support, so they expect the warning,
but of course the actual build is perfectly happy to respect the
--threads argument.
We never noticed before the recent a926c4b904 (tests: remove most uses
of C_LOCALE_OUTPUT, 2021-02-11), because the tests also were marked as
requiring the C_LOCALE_OUTPUT prerequisite. Which means they'd never
have run in FAIL_PREREQS mode, since it would always pretend that the
locale prereq was not satisfied.
These tests can't possibly work in this mode; it is a mismatch between
what the tests expect and what the build was told to do. So let's just
mark them to be skipped, using the special prereq introduced by
dfe1a17df9 (tests: add a special setup where prerequisites fail,
2019-05-13).
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Validate that fsmonitor is valid to futureproof against bugs where
check_removed might be called from places that haven't refreshed.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git to honor fsmonitor rather than issuing an lstat
when checking for dirty local deletes. Eliminates O(files)
lstats during `git diff HEAD`
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.
Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().
For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.
As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
the comment on top of threaded_check_leading_path() is outdated and no
longer reflects the behavior of this function. Let's updated it to avoid
confusions. While we are here, also remove some duplicated comments to
avoid similar maintenance problems.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor code I recently changed in 1f3299fda9 (fsck: make
fsck_config() re-usable, 2021-01-05) so that I could use fsck's config
callback in mktag in 1f3299fda9 (fsck: make fsck_config() re-usable,
2021-01-05).
I don't know what I was thinking in structuring the code this way, but
it clearly makes no sense to have an fsck_config_internal() at all
just so it can get a fsck_options when git_config() already supports
passing along some void* data.
Let's just make use of that instead, which gets us rid of the two
wrapper functions, and brings fsck's common config callback in line
with other such reusable config callbacks.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a wrapper class for `unix_stream_listen()` that uses a ".lock"
lockfile to create the unix domain socket in a race-free manner.
Unix domain sockets have a fundamental problem on Unix systems because
they persist in the filesystem until they are deleted. This is
independent of whether a server is actually listening for connections.
Well-behaved servers are expected to delete the socket when they
shutdown. A new server cannot easily tell if a found socket is
attached to an active server or is leftover cruft from a dead server.
The traditional solution used by `unix_stream_listen()` is to force
delete the socket pathname and then create a new socket. This solves
the latter (cruft) problem, but in the case of the former, it orphans
the existing server (by stealing the pathname associated with the
socket it is listening on).
We cannot directly use a .lock lockfile to create the socket because
the socket is created by `bind(2)` rather than the `open(2)` mechanism
used by `tempfile.c`.
As an alternative, we hold a plain lockfile ("<path>.lock") as a
mutual exclusion device. Under the lock, we test if an existing
socket ("<path>") is has an active server. If not, we create a new
socket and begin listening. Then we use "rollback" to delete the
lockfile in all cases.
This wrapper code conceptually exists at a higher-level than the core
unix_stream_connect() and unix_stream_listen() routines that it
consumes. It is isolated in a wrapper class for clarity.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calls to `chdir()` are dangerous in a multi-threaded context. If
`unix_stream_listen()` or `unix_stream_connect()` is given a socket
pathname that is too long to fit in a `sockaddr_un` structure, it will
`chdir()` to the parent directory of the requested socket pathname,
create the socket using a relative pathname, and then `chdir()` back.
This is not thread-safe.
Teach `unix_sockaddr_init()` to not allow calls to `chdir()` when this
flag is set.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update `unix_stream_listen()` to take an options structure to override
default behaviors. This commit includes the size of the `listen()` backlog.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The static helper function `unix_stream_socket()` calls `die()`. This
is not appropriate for all callers. Eliminate the wrapper function
and make the callers propagate the error.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create Windows implementation of "simple-ipc" using named pipes.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Brief design documentation for new IPC mechanism allowing
foreground Git client to talk with an existing daemon process
at a known location using a named pipe or unix domain socket.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the calling sequence of `read_packetized_to_strbuf()` to take
an options argument and not assume a fixed set of options. Update the
only existing caller accordingly to explicitly pass the
formerly-assumed flags.
The `read_packetized_to_strbuf()` function calls `packet_read()` with
a fixed set of assumed options (`PACKET_READ_GENTLE_ON_EOF`). This
assumption has been fine for the single existing caller
`apply_multi_file_filter()` in `convert.c`.
In a later commit we would like to add other callers to
`read_packetized_to_strbuf()` that need a different set of options.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce PACKET_READ_GENTLE_ON_READ_ERROR option to help libify the
packet readers.
So far, the (possibly indirect) callers of `get_packet_data()` can ask
that function to return an error instead of `die()`ing upon end-of-file.
However, random read errors will still cause the process to die.
So let's introduce an explicit option to tell the packet reader
machinery to please be nice and only return an error on read errors.
This change prepares pkt-line for use by long-running daemon processes.
Such processes should be able to serve multiple concurrent clients and
and survive random IO errors. If there is an error on one connection,
a daemon should be able to drop that connection and continue serving
existing and future connections.
This ability will be used by a Git-aware "Builtin FSMonitor" feature
in a later patch series.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `packet_flush_gently()` call in `write_packetized_from_buf() and
`write_packetized_from_fd()` and require the caller to call it if desired.
Rename both functions to `write_packetized_from_*_no_flush()` to prevent
later merge accidents.
`write_packetized_from_buf()` currently only has one caller:
`apply_multi_file_filter()` in `convert.c`. It always wants a flush packet
to be written after writing the payload.
However, we are about to introduce a caller that wants to write many
packets before a final flush packet, so let's make the caller responsible
for emitting the flush packet.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach `packet_write_gently()` to write the pkt-line header and the actual
buffer in 2 separate calls to `write_in_full()` and avoid the need for a
static buffer, thread-safe scratch space, or an excessively large stack
buffer.
Change `write_packetized_from_fd()` to allocate a temporary buffer rather
than using a static buffer to avoid similar issues here.
These changes are intended to make it easier to use pkt-line routines in
a multi-threaded context with multiple concurrent writers writing to
different streams.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git commit --fixup=reword:<commit>` aliases
`--fixup=amend:<commit> --only`, where it creates an empty "amend!"
commit that will reword <commit> without changing its contents when
it is rebased with `--autosquash`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git commit --fixup=amend:<commit>` will create an "amend!" commit.
The resulting commit message subject will be "amend! ..." where
"..." is the subject line of <commit> and the initial message
body will be <commit>'s message.
The "amend!" commit when rebased with --autosquash will fixup the
contents and replace the commit message of <commit> with the
"amend!" commit's message body.
In order to prevent rebase from creating commits with an empty
message we refuse to create an "amend!" commit if commit message
body is empty.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
template_dir starts off pointing to either argv or nothing. However if
the value supplied in argv is a relative path, absolute_pathdup() is
used to turn it into an absolute path. absolute_pathdup() allocates
a new string, and we then "leak" it when cmd_init_db() completes.
We don't bother to actually free the return value (instead we UNLEAK
it), because there's no significant advantage to doing so here.
Correctly freeing it would require more significant changes to code flow
which would be more noisy than beneficial.
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary goal of this change is to stop leaking init_db_template_dir.
This leak can happen because:
1. git_init_db_config() allocates new memory into init_db_template_dir
without first freeing the existing value.
2. init_db_template_dir might already contain data, either because:
2.1 git_config() can be invoked twice with this callback in a single
process - at least 2 allocations are likely.
2.2 A single git_config() allocation can invoke the callback multiple
times for a given key (see further explanation in the function
docs) - each of those calls will trigger another leak.
The simplest fix for the leak would be to free(init_db_template_dir)
before overwriting it. Instead we choose to convert to fetching
init.templatedir via git_config_get_value() as that is more explicit,
more efficient, and avoids allocations (the returned result is owned by
the config cache, so we aren't responsible for freeing it).
If we remove init_db_template_dir, git_init_db_config() ends up being
responsible only for forwarding core.* config values to
platform_core_config(). However platform_core_config() already ignores
non-core.* config values, so we can safely remove git_init_db_config()
and invoke git_config() directly with platform_core_config() as the
callback.
The platform_core_config forwarding was originally added in:
287853392a (mingw: respect core.hidedotfiles = false in git-init again, 2019-03-11
And I suspect the potential for a leak existed since the original
implementation of git_init_db_config in:
90b45187ba (Add `init.templatedir` configuration variable., 2010-02-17)
LSAN output from t0001:
Direct leak of 73 byte(s) in 1 object(s) allocated from:
#0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9a7276 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8
#2 0x9362ad in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2
#3 0x936eaa in strbuf_add /home/ahunt/oss-fuzz/git/strbuf.c:295:2
#4 0x868112 in strbuf_addstr /home/ahunt/oss-fuzz/git/./strbuf.h:304:2
#5 0x86a8ad in expand_user_path /home/ahunt/oss-fuzz/git/path.c:758:2
#6 0x720bb1 in git_config_pathname /home/ahunt/oss-fuzz/git/config.c:1287:10
#7 0x5960e2 in git_init_db_config /home/ahunt/oss-fuzz/git/builtin/init-db.c:161:11
#8 0x7255b8 in configset_iter /home/ahunt/oss-fuzz/git/config.c:1982:7
#9 0x7253fc in repo_config /home/ahunt/oss-fuzz/git/config.c:2311:2
#10 0x725ca7 in git_config /home/ahunt/oss-fuzz/git/config.c:2399:2
#11 0x593e8d in create_default_files /home/ahunt/oss-fuzz/git/builtin/init-db.c:225:2
#12 0x5935c6 in init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:449:11
#13 0x59588e in cmd_init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:714:9
#14 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#15 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#16 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#17 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#18 0x69c4de in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#19 0x7f23552d6349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure that we release the temporary strbuf during dwim_branch() for
all codepaths (and not just for the early return).
This leak appears to have been introduced in:
f60a7b763f (worktree: teach "add" to check out existing branches, 2018-04-24)
Note that UNLEAK(branchname) is still needed: the returned result is
used in add(), and is stored in a pointer which is used to point at one
of:
- a string literal ("HEAD")
- member of argv (whatever the user specified in their invocation)
- or our newly allocated string returned from dwim_branch()
Fixing the branchname leak isn't impossible, but does not seem
worthwhile given that add() is called directly from cmd_main(), and
cmd_main() returns immediately thereafter - UNLEAK is good enough.
This leak was found when running t0001 with LSAN, see also LSAN output
below:
Direct leak of 60 byte(s) in 1 object(s) allocated from:
#0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9ab076 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8
#2 0x939fcd in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2
#3 0x93af53 in strbuf_splice /home/ahunt/oss-fuzz/git/strbuf.c:239:3
#4 0x83559a in strbuf_check_branch_ref /home/ahunt/oss-fuzz/git/object-name.c:1593:2
#5 0x6988b9 in dwim_branch /home/ahunt/oss-fuzz/git/builtin/worktree.c:454:20
#6 0x695f8f in add /home/ahunt/oss-fuzz/git/builtin/worktree.c:525:19
#7 0x694a04 in cmd_worktree /home/ahunt/oss-fuzz/git/builtin/worktree.c:1036:10
#8 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#9 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#10 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#11 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#12 0x69caee in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#13 0x7f7b7dd10349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of these pointers can safely be freed when cmd_clone() completes,
therefore we make sure to free them. The one exception is that we
have to UNLEAK(repo) because it can point either to argv[0], or a
malloc'd string returned by absolute_pathdup().
We also have to free(path) in the middle of cmd_clone(): later during
cmd_clone(), path is unconditionally overwritten with a different path,
triggering a leak. Freeing the first path immediately after use (but
only in the case where it contains data) seems like the cleanest
solution, as opposed to freeing it unconditionally before path is reused
for another path. This leak appears to have been introduced in:
f38aa83f9a (use local cloning if insteadOf makes a local URL, 2014-07-17)
These leaks were found when running t0001 with LSAN, see also an excerpt
of the LSAN output below (the full list is omitted because it's far too
long, and mostly consists of indirect leakage of members of the refs we
are freeing).
Direct leak of 178 byte(s) in 1 object(s) allocated from:
#0 0x49a53d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9a6ff4 in do_xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:41:8
#2 0x9a6fca in xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:62:9
#3 0x8ce296 in copy_ref /home/ahunt/oss-fuzz/git/remote.c:885:8
#4 0x8d2ebd in guess_remote_head /home/ahunt/oss-fuzz/git/remote.c:2215:10
#5 0x51d0c5 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1308:4
#6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#10 0x69c45e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#11 0x7f6a459d5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 165 byte(s) in 1 object(s) allocated from:
#0 0x49a53d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9a6fc4 in do_xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:41:8
#2 0x9a6f9a in xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:62:9
#3 0x8ce266 in copy_ref /home/ahunt/oss-fuzz/git/remote.c:885:8
#4 0x51e9bd in wanted_peer_refs /home/ahunt/oss-fuzz/git/builtin/clone.c:574:21
#5 0x51cfe1 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1284:17
#6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#10 0x69c42e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#11 0x7f8fef0c2349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 178 byte(s) in 1 object(s) allocated from:
#0 0x49a53d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x9a6ff4 in do_xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:41:8
#2 0x9a6fca in xmalloc /home/ahunt/oss-fuzz/git/wrapper.c:62:9
#3 0x8ce296 in copy_ref /home/ahunt/oss-fuzz/git/remote.c:885:8
#4 0x8d2ebd in guess_remote_head /home/ahunt/oss-fuzz/git/remote.c:2215:10
#5 0x51d0c5 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1308:4
#6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#10 0x69c45e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#11 0x7f6a459d5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 165 byte(s) in 1 object(s) allocated from:
#0 0x49a6b2 in calloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x9a72f2 in xcalloc /home/ahunt/oss-fuzz/git/wrapper.c:140:8
#2 0x8ce203 in alloc_ref_with_prefix /home/ahunt/oss-fuzz/git/remote.c:867:20
#3 0x8ce1a2 in alloc_ref /home/ahunt/oss-fuzz/git/remote.c:875:9
#4 0x72f63e in process_ref_v2 /home/ahunt/oss-fuzz/git/connect.c:426:8
#5 0x72f21a in get_remote_refs /home/ahunt/oss-fuzz/git/connect.c:525:8
#6 0x979ab7 in handshake /home/ahunt/oss-fuzz/git/transport.c:305:4
#7 0x97872d in get_refs_via_connect /home/ahunt/oss-fuzz/git/transport.c:339:9
#8 0x9774b5 in transport_get_remote_refs /home/ahunt/oss-fuzz/git/transport.c:1388:4
#9 0x51cf80 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1271:9
#10 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#11 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#12 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#13 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#14 0x69c45e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#15 0x7f6a459d5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Direct leak of 105 byte(s) in 1 object(s) allocated from:
#0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0x9a71f6 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8
#2 0x93622d in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2
#3 0x937a73 in strbuf_addch /home/ahunt/oss-fuzz/git/./strbuf.h:231:3
#4 0x939fcd in strbuf_add_absolute_path /home/ahunt/oss-fuzz/git/strbuf.c:911:4
#5 0x69d3ce in absolute_pathdup /home/ahunt/oss-fuzz/git/abspath.c:261:2
#6 0x51c688 in cmd_clone /home/ahunt/oss-fuzz/git/builtin/clone.c:1021:10
#7 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#8 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#9 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#10 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#11 0x69c45e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#12 0x7f6a459d5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dwim_ref() allocs a new string into ref. Instead of setting to NULL to
discard it, we can FREE_AND_NULL.
This leak appears to have been introduced in:
4cf76f6bbf (builtin/reset: compute checkout metadata for reset, 2020-03-16)
This leak was found when running t0001 with LSAN, see also LSAN output below:
Direct leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x486514 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x9a7108 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
#2 0x8add6b in expand_ref /home/ahunt/oss-fuzz/git/refs.c:670:12
#3 0x8ad777 in repo_dwim_ref /home/ahunt/oss-fuzz/git/refs.c:644:22
#4 0x6394af in dwim_ref /home/ahunt/oss-fuzz/git/./refs.h:162:9
#5 0x637e5c in cmd_reset /home/ahunt/oss-fuzz/git/builtin/reset.c:426:4
#6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#10 0x69c5ce in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#11 0x7f57ebb9d349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
shorten_unambiguous_ref() returns an allocated string. We have to
track it separately from the const refname.
This leak has existed since:
9ab55daa55 (git symbolic-ref --delete $symref, 2012-10-21)
This leak was found when running t0001 with LSAN, see also LSAN output
below:
Direct leak of 19 byte(s) in 1 object(s) allocated from:
#0 0x486514 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x9ab048 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
#2 0x8b452f in refs_shorten_unambiguous_ref /home/ahunt/oss-fuzz/git/refs.c
#3 0x8b47e8 in shorten_unambiguous_ref /home/ahunt/oss-fuzz/git/refs.c:1287:9
#4 0x679fce in check_symref /home/ahunt/oss-fuzz/git/builtin/symbolic-ref.c:28:14
#5 0x679ad8 in cmd_symbolic_ref /home/ahunt/oss-fuzz/git/builtin/symbolic-ref.c:70:9
#6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
#7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
#8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
#9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
#10 0x69cc6e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
#11 0x7f98388a4349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the EXAMPLES section, git-push(1) says that 'git push origin' pushes
the current branch to the value of the 'remote.origin.merge'
configuration.
This wording (which dates back to b2ed944af7 (push: switch default from
"matching" to "simple", 2013-01-04)) is incorrect. There is no such
configuration as 'remote.<name>.merge'. This likely was originally
intended to read "branch.<name>.merge" instead.
Indeed, when 'push.default' is 'simple' (which is the default value, and
is applicable in this scenario per "without additional configuration"),
setup_push_upstream() dies if the branch's local name does not match
'branch.<name>.merge'.
Correct this long-standing typo to resolve some recent confusion on the
intended behavior of this example.
Reported-by: Adam Sharafeddine <adam.shrfdn@gmail.com>
Reported-by: Fabien Terrani <terranifabien@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
credential_approve() already checks for a non-empty password before
saving, so there's no need to do the extra check here.
Signed-off-by: John Szakmeister <john@szakmeister.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already looked for the PKI credentials in the credential store, but
failed to approve it on success. Meaning, the PKI certificate password
was never stored and git would request it on every connection to the
remote. Let's complete the chain by storing the certificate password on
success.
Likewise, we also need to reject the credential when there is a failure.
Curl appears to report client-related certificate issues are reported
with the CURLE_SSL_CERTPROBLEM error. This includes not only a bad
password, but potentially other client certificate related problems.
Since we cannot get more information from curl, we'll go ahead and
reject the credential upon receiving that error, just to be safe and
avoid caching or saving a bad password.
Signed-off-by: John Szakmeister <john@szakmeister.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every %(describe) placeholder in $Format:...$ strings in files with the
attribute export-subst is expanded by calling git describe. This can
potentially result in a lot of such calls per archive. That's OK for
local repositories under control of the user of git archive, but could
be a problem for hosted repositories.
Expand only a single %(describe) placeholder per archive for now to
avoid denial-of-service attacks. We can make this limit configurable
later if needed, but let's start out simple.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The basename comparison optimization implemented in
find_basename_matches() is very beneficial since it allows a source to
sometimes only be compared with one other file instead of N other files.
When a match is found, both a source and destination can be removed from
the matrix of inexact rename comparisons. In contrast, the irrelevant
source optimization only allows us to remove a source from the matrix of
inexact rename comparisons...but it has the advantage of allowing a
source file to not even be loaded into memory at all and be compared to
0 other files. Generally, not even comparing is a bigger performance
win, so when both optimizations could apply, prefer to use the
irrelevant-source optimization.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.708 s ± 0.111 s 5.680 s ± 0.096 s
mega-renames: 102.171 s ± 0.440 s 13.812 s ± 0.162 s
just-one-mega: 3.471 s ± 0.015 s 506.0 ms ± 3.9 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore_rename_extended() will do a bunch of setup, then check for
exact renames, then abort before inexact rename detection if there are
no more sources or destinations that need to be matched. It will
sometimes be the case, however, that either
* we start with neither any sources or destinations
* we start with no *relevant* sources
In the first of these two cases, the setup and exact rename detection
will be very cheap since there are 0 files to operate on. In the second
case, it is quite possible to have thousands of files with none of the
source ones being relevant. Avoid calling diffcore_rename_extended() or
even some of the setup before diffcore_rename_extended() when we can
determine that rename detection is unnecessary.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 6.003 s ± 0.048 s 5.708 s ± 0.111 s
mega-renames: 114.009 s ± 0.236 s 102.171 s ± 0.440 s
just-one-mega: 3.489 s ± 0.017 s 3.471 s ± 0.015 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The past several commits determined conditions when rename sources might
be needed, and filled a relevant_sources strset with those paths. Pass
these along to diffcore_rename_extended() to use to limit the sources
that we need to detect renames for.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 12.596 s ± 0.061 s 6.003 s ± 0.048 s
mega-renames: 130.465 s ± 0.259 s 114.009 s ± 0.236 s
just-one-mega: 3.958 s ± 0.010 s 3.489 s ± 0.017 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of directory rename detection is that if one side of history
renames a directory, and the other side adds new files under the old
directory, then the merge can move those new files into the new
directory. This leads to the following important observation:
* If the other side does not add any new files under the old
directory, we do not need to detect any renames for that directory.
Similarly, directory rename detection had an important requirement:
* If a directory still exists on one side of history, it has not been
renamed on that side of history. (See section 4 of t6423 or
Documentation/technical/directory-rename-detection.txt for more
details).
Using these two bits of information, we note that directory rename
detection is only needed in cases where (1) directories exist in the
merge base and on one side of history (i.e. dirmask == 3 or dirmask ==
5), and (2) where there is some new file added to that directory on the
side where it still exists (thus where the file has filemask == 2 or
filemask == 4, respectively). This has to be done in two steps, because
we have the dirmask when we are first considering the directory, and
won't get the filemasks for the files within it until we recurse into
that directory. So, we save
dir_rename_mask = dirmask - 1
when we hit a directory that is missing on one side, and then later look
for cases of
filemask == dir_rename_mask
One final note is that as soon as we hit a directory that needs
directory rename detection, we will need to detect renames in all
subdirectories of that directory as well due to the "majority rules"
decision when files are renamed into different directory hierarchies.
We arbitrarily use the special value of 0x07 to record when we've hit
such a directory.
The combination of all the above mean that we introduce a variable
named dir_rename_mask (couldn't think of a better name) which has one
of the following values as we traverse into a directory:
* 0x00: directory rename detection not needed
* 0x02 or 0x04: directory rename detection only needed if files added
* 0x07: directory rename detection definitely needed
We then pass this value through to add_pairs() so that it can mark
location_relevant as true only when dir_rename_mask is 0x07.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add traverse_trees_wrapper() and traverse_trees_wrapper_callback()
functions. The former runs traverse_trees() with info->fn set to
traverse_trees_wrapper_callback, in order to simply save all the entries
without processing or recursing into any of them. This step allows
extra computation to be done (e.g. checking some condition across all
files) that can be used later. Then, after that is completed, it
iterates over all the saved entries and calls the original info->fn
callback with the saved data.
Currently, this does nothing more than marginally slowing down the tree
traversal since we do not take advantage of the opportunity to compute
anything special in traverse_trees_wrapper_callback(), and thus the real
callback will be called identically as it would have been without this
extra wrapper. However, a subsequent commit will add some special
computation of some values that the real callback will be able to use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to determine whether directory rename detection is needed, we
as a pre-requisite need a way to traverse through all the files in a
given tree before visiting any directories within that tree.
traverse_trees() only iterates through the entries in the order they
appear, so add some data structures that will store all the entries as
we iterate through them in traverse_trees(), which will allow us to
re-traverse them in our desired order.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rename detection works by trying to pair all file deletions (or
"sources") with all file additions (or "destinations"), checking
similarity, and then marking the sufficiently similar ones as renames.
This can be expensive if there are many sources and destinations on a
given side of history as it results in an N x M comparison matrix.
However, there are many cases where we can compute in advance that
detecting renames for some of the sources provides no useful information
and thus that we can exclude those sources from the matrix.
To see why, first note that the merge machinery uses detected renames in
two ways:
* directory rename detection: when one side of history renames a
directory, and the other side of history adds new files to that
directory, we want to be able to warn the user about the need to
chose whether those new files stay in the old directory or move
to the new one.
* three-way content merging: in order to do three-way content merging
of files, we need three different file versions. If one side of
history renamed a file, then some of the content for the file is
found under a different path than in the merge base or on the
other side of history.
Add a simple testcase showing the two kinds of reasons renames are
relevant; it's a testcase that will only pass if we detect both kinds of
needed renames.
Other than the testcase added above, this commit concentrates just on
the three-way content merging; it will punt and mark all sources as
needed for directory rename detection, and leave it to future commits to
narrow that down more.
The point of three-way content merging is to reconcile changes made on
*both* sides of history. What if the file wasn't modified on both
sides? There are two possibilities:
* If it wasn't modified on the renamed side:
-> then we get to do exact rename detection, which is cheap.
* If it wasn't modified on the unrenamed side:
-> then detection of a rename for that source file is irrelevant
That latter claim might be surprising at first, so let's walk through a
case to show why rename detection for that source file is irrelevant.
Let's use two filenames, old.c & new.c, with the following abbreviated
object ids (and where the value '000000' is used to denote that the file
is missing in that commit):
old.c new.c
MERGE_BASE: 01d01d 000000
MERGE_SIDE1: 01d01d 000000
MERGE_SIDE2: 000000 5e1ec7
If the rename *isn't* detected:
then old.c looks like it was unmodified on one side and deleted on
the other and should thus be removed. new.c looks like a new file we
should keep as-is.
If the rename *is* detected:
then a three-way content merge is done. Since the version of the
file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge
will produce exactly the version of the file whose abbreviated
object id is 5e1ec7. It will record that file at the path new.c,
while removing old.c from the directory.
Note that these two results are identical -- a single file named 'new.c'
with object id 5e1ec7. In other words, it doesn't matter if the rename
is detected in the case where the file is unmodified on the unrenamed
side.
Use this information to compute whether we need rename detection for
each source created in add_pair().
It's probably worth noting that there used to be a few other edge or
corner cases besides three-way content merges and directory rename
detection where lack of rename detection could have affected the result,
but those cases actually highlighted where conflict resolution methods
were not consistent with each other. Fixing those inconsistencies were
thus critically important to enabling this optimization. That work
involved the following:
* bringing consistency to add/add, rename/add, and rename/rename
conflict types, as done back in the topic merged at commit
ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04),
and further extended in commits 2a7c16c980 ("t6422, t6426: be more
flexible for add/add conflicts involving renames", 2020-08-10) and
e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts
involving pair renames", 2020-08-10)
* making rename/delete more consistent with modify/delete
as done in commits 1f3c9ba707 ("t6425: be more flexible with
rename/delete conflict messages", 2020-08-10) and 727c75b23f
("t6404, t6423: expect improved rename/delete handling in ort
backend", 2020-10-26)
Since the set of relevant_sources we compute has not yet been narrowed
down for directory rename detection, we do not pass it to
diffcore_rename_extended() yet. That will be done after subsequent
commits narrow down the list of relevant_sources needed for directory
rename detection reasons.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the ability to diffcore_rename_extended() to allow external callers
to declare that they only need renames detected for a subset of source
files, and use that information to skip detecting renames for them.
There are two important pieces to this optimization that may not be
obvious at first glance:
* We do not require callers to just filter the filepairs out
to remove the non-relevant sources, because exact rename detection
is fast and when it finds a match it can remove both a source and a
destination whereas the relevant_sources filter can only remove a
source.
* We need to filter out the source pairs in a preliminary pass instead
of adding a
strset_contains(relevant_sources, one->path)
check within the nested matrix loop. The reason for that is if we
have 30k renames, doing 30k * 30k = 900M strset_contains() calls
becomes extraordinarily expensive and defeats the performance gains
from this change; we only want to do 30k such calls instead.
If callers pass NULL for relevant_sources, that is special cases to
treat all sources as relevant. Since all callers currently pass NULL,
this optimization does not yet have any effect. Subsequent commits will
have merge-ort compute a set of relevant_sources to restrict which
sources we detect renames for, and have merge-ort pass that set of
relevant_sources to diffcore_rename_extended().
A note about filtering order:
Some may be curious why we don't filter out irrelevant sources at the
same time we filter out exact renames. While that technically could be
done at this point, there are two reasons to defer it:
First, was to reinforce a lesson that was too easy to forget. As I
mentioned above, in the past I filtered irrelevant sources out before
exact rename checking, and then discovered that exact renames' ability
to remove both sources and destinations was an important consideration
and thus doing the filtering after exact rename checking would speed
things up. Then at some point I realized that basename matching could
also remove both sources and destinations, and decided to put irrelevant
source filtering after basename filtering. That slowed things down a
lot. But, despite learning about this important ordering, in later
restructuring I forgot and made the same mistake of putting the
filtering after basename guided rename detection again. So, I have this
series of patches structured to do the irrelevant filtering last to
start to show how much extra it costs, and then add relevant filtering
in to find_basename_matches() to show how much it speeds things up.
Basically, it's a way to reinforce something that apparently was too
easy to forget, and make sure the commit messages record this lesson.
Second, the items in the "relevant_sources" in this patch series will
include all sources that *might be* relevant. It has to be conservative
and catch anything that might need a rename, but in the patch series
after this one we'll find ways to weed out more of the *might be*
relevant ones. Unfortunately, merge-ort does not have sufficient
information to weed those ones out, and there isn't enough information
at the time of filtering exact renames out to remove the extra ones
either. It has to be deferred. So the deferral is in part to simplify
some later additions.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 552955ed7f ("clone: use more conventional config/option layering",
2020-10-01), clone learned to read configuration options earlier in its
execution, before creating the new repository. However, that led to a
problem: if the core.bare setting is set to false in the global config,
cloning a bare repository segfaults. This happens because the
repository is falsely thought to be non-bare, but clone has set the work
tree to NULL, which is then dereferenced.
The code to initialize the repository already considers the fact that a
user might want to override the --bare option for git init, but it
doesn't take into account clone, which uses a different option. Let's
just check that the work tree is not NULL, since that's how clone
indicates that the repository is bare. This is also the case for git
init, so we won't be regressing that case.
Reported-by: Joseph Vusich <jvusich@amazon.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking whether a commit was rewritten to a single object id, we
use a glob that insists on a 40-hex result. This works for sha1, but
fails t7003 when run with GIT_TEST_DEFAULT_HASH=sha256.
Since the previous commit simplified the case statement here, we only
have two arms: an empty string or a single object id. We can just loosen
our glob to match anything, and still distinguish those cases (we lose
the ability to notice bogus input, but that's not a problem; we are the
one who wrote the map in the first place, and anyway update-ref will
complain loudly if the input isn't a valid hash).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a ref maps to a commit that is neither rewritten nor kept by
filter-branch (e.g., because it was eliminated by rev-list's pathspec
selection), we rewrite it to its nearest ancestor.
Since the initial commit in 6f6826c52b (Add git-filter-branch,
2007-06-03), we have warned when there are multiple such ancestors in
the map file. However, the warning code is impossible to trigger these
days. Since a0e46390d3 (filter-branch: fix ref rewriting with
--subdirectory-filter, 2008-08-12), we find the ancestor using "rev-list
-1", so it can only ever have a single value.
This code is made doubly confusing by the fact that we append to the map
file when mapping ancestors. However, this can never yield multiple
values because:
- we explicitly check whether the map already exists, and if so, do
nothing (so our "append" will always be to a file that does not
exist)
- even if we were to try mapping twice, the process to do so is
deterministic. I.e., we'd always end up with the same ancestor for a
given sha1. So warning about it would be pointless; there is no
ambiguity.
So swap out the warning code for a BUG (which we'll simplify further in
the next commit). And let's stop using the append operator to make the
ancestor-mapping code less confusing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After it has rewritten all of the commits, filter-branch will then
rewrite each of the input refs based on the resulting map of old/new
commits. But we don't have any explicit test coverage of this code.
Let's make sure we are covering each of those cases:
- deleting a ref when all of its commits were pruned
- rewriting a ref based on the mapping (this happens throughout the
script, but let's make sure we generate the correct messages)
- rewriting a ref whose tip was excluded, in which case we rewrite to
the nearest ancestor. Note in this case that we still insist that no
"warning" line is present (even though it looks like we'd trigger
the "... was rewritten into multiple commits" one). See the next
commit for more details.
Note these all pass currently, but the latter two will fail when run
with GIT_TEST_DEFAULT_HASH=sha256.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit teaches `git stash show --include-untracked`. It
may be desirable for a user to be able to always enable the
--include-untracked behavior. Teach the stash.showIncludeUntracked
config option which allows users to do this in a similar manner to
stash.showPatch.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stash entries can be made with untracked files via
`git stash push --include-untracked`. However, because the untracked
files are stored in the third parent of the stash entry and not the
stash entry itself, running `git stash show` does not include the
untracked files as part of the diff.
With --include-untracked, untracked paths, which are recorded in the
third-parent if it exists, are shown in addition to the paths that have
modifications between the stash base and the working tree in the stash.
It is possible to manually craft a malformed stash entry where duplicate
untracked files in the stash entry will mask tracked files. We detect
and error out in that case via a custom unpack_trees() callback:
stash_worktree_untracked_merge().
Also, teach stash the --only-untracked option which only shows the
untracked files of a stash entry. This is similar to `git show stash^3`
but it is nice to provide a convenient abstraction for it so that users
do not have to think about the underlying implementation.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment in this block is meant to indicate that passing '--all',
'--reflog', and so on aren't necessary when repacking with the
'--geometric' option.
But, it has two problems: first, it is factually incorrect ('--all' is
*not* incompatible with '--stdin-packs' as the comment suggests);
second, it is quite focused on the geometric case for a block that is
guarding against it.
Reword this comment to address both issues.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a number of places in the geometric repack code where we
multiply the number of objects in a pack by another unsigned value. We
trust that the number of objects in a pack is always representable by a
uint32_t, but we don't necessarily trust that that number can be
multiplied without overflow.
Sprinkle some unsigned_add_overflows() and unsigned_mult_overflows() in
split_pack_geometry() to check that we never overflow any unsigned types
when adding or multiplying them.
Arguably these checks are a little too conservative, and certainly they
do not help the readability of this function. But they are serving a
useful purpose, so I think they are worthwhile overall.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To determine the where to place the split when repacking with the
'--geometric' option, split_pack_geometry() assigns the "split" variable
and then decrements it in a loop.
It would be equivalent (and more readable) to assign the split to the
loop position after exiting the loop, so do that instead.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't currently have a test that demonstrates the non-idempotent
behavior of 'git repack --geometric' with loose objects, so add one here
to make sure we don't regress in this area.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0fabafd0b9 (builtin/repack.c: add '--geometric' option, 2021-02-22),
the 'git repack --geometric' code aborts early when there is zero or one
pack.
When there are no packs, this code does the right thing by placing the
split at "0". But when there is exactly one pack, the split is placed at
"1", which means that "git repack --geometric" (with any factor)
repacks all of the objects in a single pack.
This is wasteful, and the remaining code in split_pack_geometry() does
the right thing (not repacking the objects in a single pack) even when
only one pack is present.
Loosen the guard to only stop when there aren't any packs, and let the
rest of the code do the right thing. Add a test to ensure that this is
the case.
Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although `test -f` has the same functionality as test_path_is_file(), in
the case where test_path_is_file() fails, we get much better debugging
information.
Replace `test -f` with test_path_is_file so that future developers
will have a better experience debugging these test cases.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the hash-transition, git can operate on more than just SHA-1
repositories. Replace "sha1"-specific documentation with hash-agnostic
terminology.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In modern documentation, args, placeholders and filenames are
monospaced. Apply monospace formatting to these objects.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each %(describe) placeholder is expanded using a separate git describe
call. Their outputs depend on the tags present at the time, so there's
no consistency guarantee. Document that fact.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that the test is covering both describable and
undescribable commits.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference-transaction hook doesn't clearly document its scope and
what values it receives as input. Document it to make it less surprising
and clearly delimit its (current) scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The githooks(5) documentation states in several places that the hook
will receive a SHA-1 or hashes of 40 characters length. Given that we're
transitioning to a world where both SHA-1 and SHA-256 are supported,
this is inaccurate.
Fix the issue by replacing mentions of SHA-1 with "object name" and not
explicitly mentioning the hash size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir_rename_counts has a mapping of a mapping, in particular, it has
old_dir => { new_dir => count }
We want a simple mapping of
old_dir => new_dir
based on which new_dir had the highest count for a given old_dir.
Compute this and store it in dir_rename_guess.
This is the final piece of the puzzle needed to make our guesses at
which directory files have been moved to when basenames aren't unique.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 12.775 s ± 0.062 s 12.596 s ± 0.061 s
mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s
just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are using dir_rename_counts to count the number of other directories
that files within a directory moved to. We only need this information
for directories that disappeared, though, so we can return early from
update_dir_rename_counts() for other paths.
If dirs_removed is passed to diffcore_rename_extended(), then it
provides the relevant bits of information for us to limit this counting
to relevant dirs. If dirs_removed is not passed, we would need to
compute some replacement in order to do this limiting. Introduce a new
info->relevant_source_dirs variable for this purpose, even though at
this stage we will only set it to dirs_removed for simplicity.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compute dir_rename_counts based just on exact renames to start, as that
can provide us useful information in find_basename_matches(). This is
done by moving the code from compute_dir_rename_counts() into
initialize_dir_rename_info(), resulting in it being computed earlier and
based just on exact renames. Since that's an incomplete result, we
augment the counts via calling update_dir_rename_counts() after each
basename-guide and inexact rename detection match is found.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diffcore_rename_extended() is passed a NULL dir_rename_count, we
will still want to create a temporary one for use by
find_basename_matches(), but have it fully deallocated before
diffcore_rename_extended() returns. However, when
diffcore_rename_extended() is passed a dir_rename_count, we want to fill
that strmap with appropriate values and return it. However, for our
interim purposes we may also add entries corresponding to directories
that cannot have been renamed due to still existing on both sides.
Extend cleanup_dir_rename_info() to handle these two different cases,
cleaning up the relevant bits of information for each case.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This continues the migration of the directory rename detection code into
diffcore-rename, now taking the simple step of combining it with the
dir_rename_info struct. Future commits will then make dir_rename_counts
be computed in stages, and add computation of dir_rename_guess.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we adjust the usage of dir_rename_count we want to have a function
for clearing, or partially clearing it out. Add a
partial_clear_dir_rename_count() function for this purpose.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the computation of dir_rename_count from merge-ort.c to
diffcore-rename.c, making slight adjustments to the data structures
based on the move. While the diffstat looks large, viewing this commit
with --color-moved makes it clear that only about 20 lines changed.
With this patch, the computation of dir_rename_count is still only done
after inexact rename detection, but subsequent commits will add a
preliminary computation of dir_rename_count after exact rename
detection, followed by some updates after inexact rename detection.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compute a mapping of full filename to the index within rename_dst where
that filename is found, and store it in idx_map. idx_possible_rename()
needs this to quickly finding an array entry in rename_dst given the
pathname.
While at it, add placeholder initializations for dir_rename_count and
dir_rename_guess; these will be more fully populated in subsequent
commits.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new struct dir_rename_info with various values we need inside our
idx_possible_rename() function introduced in the previous commit. Add a
basic implementation for this function showing how we plan to use the
variables, but which will just return early with a value of -1 (not
found) when those variables are not set up.
Future commits will do the work necessary to set up those other
variables so that idx_possible_rename() does not always return -1.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit noted that it is very common for people to move files
across directories while keeping their filename the same. The last few
commits took advantage of this and showed that we can accelerate rename
detection significantly using basenames; since files with the same
basename serve as likely rename candidates, we can check those first and
remove them from the rename candidate pool if they are sufficiently
similar.
Unfortunately, the previous optimization was limited by the fact that
the remaining basenames after exact rename detection are not always
unique. Many repositories have hundreds of build files with the same
name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have
hundreds of source files with the same name. (For example, the linux
kernel has 100 setup.c, 87 irq.c, and 112 core.c files. A repository at
$DAYJOB has a lot of ObjectFactory.java and Plugin.java files).
For these files with non-unique basenames, we are faced with the task of
attempting to determine or guess which directory they may have been
relocated to. Such a task is precisely the job of directory rename
detection. However, there are two catches: (1) the directory rename
detection code has traditionally been part of the merge machinery rather
than diffcore-rename.c, and (2) directory rename detection currently
runs after regular rename detection is complete. The 1st catch is just
an implementation issue that can be overcome by some code shuffling.
The 2nd requires us to add a further approximation: we only have access
to exact renames at this point, so we need to do directory rename
detection based on just exact renames. In some cases we won't have
exact renames, in which case this extra optimization won't apply. We
also choose to not apply the optimization unless we know that the
underlying directory was removed, which will require extra data to be
passed in to diffcore_rename_extended(). Also, even if we get a
prediction about which directory a file may have relocated to, we will
still need to check to see if there is a file in the predicted
directory, and then compare the two files to see if they meet the higher
min_basename_score threshold required for marking the two files as
renames.
This commit introduces an idx_possible_rename() function which will
do this directory rename detection for us and give us the index within
rename_dst of the resulting filename. For now, this function is
hardcoded to return -1 (not found) and just hooks up how its results
would be used once we have a more complete implementation in place.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running the perf suite, we copy files from an existing $GIT_DIR to
a scratch repository to give us a realistic setup on which to operate.
Since the perf scripts themselves may modify the scratch repository, we
want to make sure we've scrubbed any references back to the original.
One existing example is that we avoid copying the file "commondir" at
the top-level of the repository. In a worktree git-dir (e.g.,
.git/worktrees/foo), that file contains the path to the parent
repository; copying it could mean ref updates in the scratch repository
affect the original.
But there are other files we should cover, too:
- "gitdir" in a worktree git-dir contains the path to the actual .git
file in the working tree. We _shouldn't_ end up looking at it at
all, since the lack of a "commondir" file means Git won't consider
this to be a worktree git-dir. But it's best to err on the safe
side.
- in a parent repository that contains worktrees, the
"$GIT_DIR/worktrees" directory will contain the git dirs for the
individual worktrees. Which will themselves contain commondir and
gitdir files that may reference the original repository. We should
likewise remove them.
Note that this does mean that the perf suite's scratch repositories
will never have any worktrees. That's OK; we don't have any perf tests
that are influenced by their presence. If we add any, they'd
probably want to create the worktrees themselves anyway.
This patch adds both paths to the set of omissions in
test_perf_copy_repo_contents(). Note that we won't get confused here by
matching arbitrary names like refs/heads/commondir. This list is always
matching top-level entries in $GIT_DIR (we rely on "cp -R" to do the
actual recursion).
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf suite gets confused when test_perf_default_repo is pointed at a
worktree (which includes when it is run from within a worktree at all,
since the default is to use the current repository).
Here's an example:
$ git worktree add ~/foo
Preparing worktree (new branch 'foo')
HEAD is now at 328c109303 The eighth batch
$ cd ~/foo
$ make
[...build output...]
$ cd t/perf
$ ./p0000-perf-lib-sanity.sh -v -i
[...]
perf 1 - test_perf_default_repo works:
running:
foo=$(git rev-parse HEAD) &&
test_export foo
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
The problem is that we didn't copy all of the necessary files from the
source repository (in this case we got HEAD, but we have no refs!). We
discover the git-dir with "rev-parse --git-dir", but this points to the
worktree's partial repository in .../.git/worktrees/foo.
That partial repository has a "commondir" file which points to the main
repository, where the actual refs are stored, but we don't copy it. This
is the correct thing to do, though! If we did copy it, then our scratch
test repo would be pointing back to the original main repo, and any ref
updates we made in the tests would impact that original repo.
Instead, we need to either:
1. Make a scratch copy of the original main repo (in addition to the
worktree repo), and point the scratch worktree repo's commondir at
it. This preserves the original relationship, but it's doubtful any
script really cares (if they are testing worktree performance,
they'd probably make their own worktrees). And it's trickier to get
right.
2. Collapse the main and worktree repos into a single scratch repo.
This can be done by copying everything from both, preferring any
files from the worktree repo.
This patch does the second one. With this applied, the example above
results in p0000 running successfully.
Reported-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gitattributes documentation mentions that either the clean cmd or
the smudge cmd can be left unspecified in a filter definition. However,
when the filter is marked as 'required', the absence of any one of these
two should be treated as an error. Git already fails under these
circumstances, but not always in a pleasant way: omitting a clean cmd in
a required filter triggers an assertion error which leaves the user with
a quite verbose message:
git: convert.c:1459: convert_to_git_filter_fd: Assertion "ca.drv->clean || ca.drv->process" failed.
This assertion is not really necessary, as the apply_filter() call below
it already performs the same check. And when this condition is not met,
the function returns 0, making the caller die() with a much nicer
message. (Also note that die()-ing here is the right behavior as
`would_convert_to_git_filter_fd() == true` is a precondition to use
convert_to_git_filter_fd(), and the former is only true when the filter
is required.) So remove the assertion and add two regression tests to
make sure that git fails nicely when either the smudge or clean command
is missing on a required filter.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two established generation number versions:
1: topological levels
2: corrected commit dates
The corrected commit dates are enabled by default, but they also write
extra data in the GDAT and GDOV chunks. Services that host Git data
might want to have more control over when this feature rolls out than
just updating the Git binaries.
Add a new "commitGraph.generationVersion" config option that specifies
the intended generation number version. If this value is less than 2,
then the GDAT chunk is never written _or read_ from an existing file.
This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT
environment variable in the test suite. Remove it.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method uses 'the_repository' in a few places. A
new need for a repository pointer is coming in the following change, so
group these instances into a local variable 'r' that could eventually
become part of the method signature, if so desired.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a remote is renamed don't change the canonical "*.pushRemote"
form to "*.pushremote". Fixes and tests for a minor bug in
923d4a5ca4 (remote rename/remove: handle branch.<name>.pushRemote
config values, 2020-01-27). See the preceding commit for why this does
& doesn't matter.
While we're at it let's also test that we handle the "*.pushDefault"
key correctly. The code to handle that was added in
b3fd6cbf29 (remote rename/remove: gently handle remote.pushDefault
config, 2020-02-01) and does the right thing, but nothing tested that
we wrote out the canonical camel-cased form.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change "git remote add" so that it adds a *.tagOpt key, and not the
lower-cased *.tagopt on "git remote add --no-tags", just as "git clone
--no-tags" would do.
This doesn't matter for anything that reads the config. It's just
prettier if we write config keys in their documented camelCase form to
user-readable config files.
When I added support for "clone -no-tags" in 0dab2468ee (clone: add a
--no-tags option to clone without tags, 2017-04-26) I made it use
the *.tagOpt form, but the older "git remote add" added in
111fb85865 (remote add: add a --[no-]tags option, 2010-04-20) has
been using *.tagopt all this time.
It's easy enough to add a test for this, so let's do that. We can't
use "git config -l" there, because it'll normalize the keys to their
lower-cased form. Let's add the test for "git clone" too for good
measure, not just to the "git remote" codepath we're fixing.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ds/chunked-file-api:
commit-graph.c: display correct number of chunks when writing
chunk-format: add technical docs
chunk-format: restore duplicate chunk checks
midx: use 64-bit multiplication for chunk sizes
midx: use chunk-format read API
commit-graph: use chunk-format read API
chunk-format: create read chunk API
midx: use chunk-format API in write_midx_internal()
midx: drop chunk progress during write
midx: return success/failure in chunk write methods
midx: add num_large_offsets to write_midx_context
midx: add pack_perm to write_midx_context
midx: add entries to write_midx_context
midx: use context in write_midx_pack_names()
midx: rename pack_info to write_midx_context
commit-graph: use chunk-format write API
chunk-format: create chunk format write API
commit-graph: anonymize data in chunk_write_fn
Add a new GIT_OBJS variable, with the objects sufficient to get to a
git.o or common-main.o.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the order of the OBJECTS assignment, this makes a follow-up
change where we split it up into two variables smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the long OBJECTS line into multiple lines using the "+="
assignment we commonly use elsewhere in the Makefile when these lines
get unwieldy.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add TEST_OBJS to the list of other *_OBJS variables we reset. We had
already established this pattern when TEST_OBJS was introduced in
daa99a9172 (Makefile: make sure test helpers are rebuilt when headers
change, 2010-01-26), but it wasn't added to the list in that commit
along with the rest.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Often it is useful to both:
- have relatively few packfiles in a repository, and
- avoid having so few packfiles in a repository that we repack its
entire contents regularly
This patch implements a '--geometric=<n>' option in 'git repack'. This
allows the caller to specify that they would like each pack to be at
least a factor times as large as the previous largest pack (by object
count).
Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
With a geometric factor of 'r', it should be that:
objects(Pi) > r*objects(P(i-1))
for all i in [1, n], where the packs are sorted by
objects(P1) <= objects(P2) <= ... <= objects(Pn).
Since finding a true optimal repacking is NP-hard, we approximate it
along two directions:
1. We assume that there is a cutoff of packs _before starting the
repack_ where everything to the right of that cut-off already forms
a geometric progression (or no cutoff exists and everything must be
repacked).
2. We assume that everything smaller than the cutoff count must be
repacked. This forms our base assumption, but it can also cause
even the "heavy" packs to get repacked, for e.g., if we have 6
packs containing the following number of objects:
1, 1, 1, 2, 4, 32
then we would place the cutoff between '1, 1' and '1, 2, 4, 32',
rolling up the first two packs into a pack with 2 objects. That
breaks our progression and leaves us:
2, 1, 2, 4, 32
^
(where the '^' indicates the position of our split). To restore a
progression, we move the split forward (towards larger packs)
joining each pack into our new pack until a geometric progression
is restored. Here, that looks like:
2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32
^ ^ ^ ^
This has the advantage of not repacking the heavy-side of packs too
often while also only creating one new pack at a time. Another wrinkle
is that we assume that loose, indexed, and reflog'd objects are
insignificant, and lump them into any new pack that we create. This can
lead to non-idempotent results.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a recent patch we added a function 'find_kept_pack_entry()' to look
for an object only among kept packs.
While this function avoids doing any lookup work in non-kept packs, it
is still linear in the number of packs, since we have to traverse the
linked list of packs once per object. Let's cache a reduced version of
that list to save us time.
Note that this cache will last the lifetime of the program. We could
invalidate it on reprepare_packed_git(), but there's not much point in
being rigorous here:
- we might already fail to notice new .keep packs showing up after the
program starts. We only reprepare_packed_git() when we fail to find
an object. But adding a new pack won't cause that to happen.
Somebody repacking could add a new pack and delete an old one, but
most of the time we'd have a descriptor or mmap open to the old
pack anyway, so we might not even notice.
- in pack-objects we already cache the .keep state at startup, since
56dfeb6263 (pack-objects: compute local/ignore_pack_keep early,
2016-07-29). So this is just extending that concept further.
- we don't have to worry about any packed_git being removed; we always
keep the old structs around, even after reprepare_packed_git()
We do defensively invalidate the cache in case the set of kept packs
being asked for changes (e.g., only in-core kept packs were cached, but
suddenly the caller also wants on-disk kept packs, too). In theory we
could build all three caches and switch between them, but it's not
necessary, since this patch (and series) never changes the set of kept
packs that it wants to inspect from the cache.
So that "optimization" is more about being defensive in the face of
future changes than it is about asking for multiple kinds of kept packs
in this patch.
Here are p5303 results (as always, measured against the kernel):
Test HEAD^ HEAD
-----------------------------------------------------------------------------------------------
5303.5: repack (1) 57.34(54.66+10.88) 56.98(54.36+10.98) -0.6%
5303.6: repack with kept (1) 57.38(54.83+10.49) 57.17(54.97+10.26) -0.4%
5303.11: repack (50) 71.70(88.99+4.74) 71.62(88.48+5.08) -0.1%
5303.12: repack with kept (50) 72.58(89.61+4.78) 71.56(88.80+4.59) -1.4%
5303.17: repack (1000) 217.19(491.72+14.25) 217.31(490.82+14.53) +0.1%
5303.18: repack with kept (1000) 246.12(520.07+14.93) 217.08(490.37+15.10) -11.8%
and the --stdin-packs case, which scales a little bit better (although
not by that much even at 1,000 packs):
5303.7: repack with --stdin-packs (1) 0.00(0.00+0.00) 0.00(0.00+0.00) =
5303.13: repack with --stdin-packs (50) 3.43(11.75+0.24) 3.43(11.69+0.30) +0.0%
5303.19: repack with --stdin-packs (1000) 130.50(307.15+7.66) 125.13(301.36+8.04) -4.1%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have find_kept_pack_entry(), we don't have to manually keep
hunting through every pack to find a possible "kept" duplicate of the
object. This should be faster, assuming only a portion of your total
packs are actually kept.
Note that we have to re-order the logic a bit here; we can deal with the
disqualifying situations first (e.g., finding the object in a non-local
pack with --local), then "kept" situation(s), and then just fall back to
other "--local" conditions.
Here are the results from p5303 (measurements again taken on the
kernel):
Test HEAD^ HEAD
-----------------------------------------------------------------------------------------------
5303.5: repack (1) 57.26(54.59+10.84) 57.34(54.66+10.88) +0.1%
5303.6: repack with kept (1) 57.33(54.80+10.51) 57.38(54.83+10.49) +0.1%
5303.11: repack (50) 71.54(88.57+4.84) 71.70(88.99+4.74) +0.2%
5303.12: repack with kept (50) 85.12(102.05+4.94) 72.58(89.61+4.78) -14.7%
5303.17: repack (1000) 216.87(490.79+14.57) 217.19(491.72+14.25) +0.1%
5303.18: repack with kept (1000) 665.63(938.87+15.76) 246.12(520.07+14.93) -63.0%
and the --stdin-packs timings:
5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) 0.00(0.00+0.00) -100.0%
5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) 3.43(11.75+0.24) -2.8%
5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) 130.50(307.15+7.66) -33.4%
So our repack with an empty .keep pack is roughly as fast as one without
a .keep pack up to 50 packs. But the --stdin-packs case scales a little
better, too.
Notably, it is faster than a repack of the same size and a kept pack. It
looks at fewer objects, of course, but the penalty for looking at many
packs isn't as costly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add two new tests to measure repack performance. Both tests split the
repository into synthetic "pushes", and then leave the remaining objects
in a big base pack.
The first new test marks an empty pack as "kept" and then passes
--honor-pack-keep to avoid including objects in it. That doesn't change
the resulting pack, but it does let us compare to the normal repack case
to see how much overhead we add to check whether objects are kept or
not.
The other test is of --stdin-packs, which gives us a sense of how that
number scales based on the number of packs we provide as input. In each
of those tests, the empty pack isn't considered, but the residual pack
(objects that were left over and not included in one of the synthetic
push packs) is marked as kept.
(Note that in the single-pack case of the --stdin-packs test, there is
nothing do since there are no non-excluded packs).
Here are some timings on a recent clone of the kernel:
5303.5: repack (1) 57.26(54.59+10.84)
5303.6: repack with kept (1) 57.33(54.80+10.51)
in the 50-pack case, things start to slow down:
5303.11: repack (50) 71.54(88.57+4.84)
5303.12: repack with kept (50) 85.12(102.05+4.94)
and by the time we hit 1,000 packs, things are substantially worse, even
though the resulting pack produced is the same:
5303.17: repack (1000) 216.87(490.79+14.57)
5303.18: repack with kept (1000) 665.63(938.87+15.76)
That's because the code paths around handling .keep files are known to
scale badly; they look in every single pack file to find each object.
Our solution to that was to notice that most repos don't have keep
files, and to make that case a fast path. But as soon as you add a
single .keep, that part of pack-objects slows down again (even if we
have fewer objects total to look at).
Likewise, the scaling is pretty extreme on --stdin-packs (but each
subsequent test is also being asked to do more work):
5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00)
5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24)
5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are in a helper function, so the usual chain-lint doesn't notice
them. This function is still not perfect, as it has some git invocations
on the left-hand-side of the pipe, but it's primary purpose is timing,
not finding bugs or correctness issues.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).
This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:
- It requires every caller that wants to drive 'git pack-objects' in
this way to implement pack iteration themselves. This forces the
caller to think about details like what order objects are fed to
pack-objects, which callers would likely rather not do.
- If the set of objects in included packs is large, it requires
sending a lot of data over a pipe, which is inefficient.
- The caller is forced to keep track of the excluded objects, too, and
make sure that it doesn't send any objects that appear in both
included and excluded packs.
But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.
The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.
Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.
'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.
To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.
To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).
This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).
Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.
(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future caller will want to be able to perform a reachability traversal
which terminates when visiting an object found in a kept pack. The
closest existing option is '--honor-pack-keep', but this isn't quite
what we want. Instead of halting the traversal midway through, a full
traversal is always performed, and the results are only trimmed
afterwords.
Besides needing to introduce a new flag (since culling results
post-facto can be different than halting the traversal as it's
happening), there is an additional wrinkle handling the distinction
in-core and on-disk kept packs. That is: what kinds of kept pack should
stop the traversal?
Introduce '--no-kept-objects[=<on-disk|in-core>]' to specify which kinds
of kept packs, if any, should stop a traversal. This can be useful for
callers that want to perform a reachability analysis, but want to leave
certain packs alone (for e.g., when doing a geometric repack that has
some "large" packs which are kept in-core that it wants to leave alone).
Note that this option is not guaranteed to produce exactly the set of
objects that aren't in kept packs, since it's possible the traversal
order may end up in a situation where a non-kept ancestor was "cut off"
by a kept object (at which point we would stop traversing). But, we
don't care about absolute correctness here, since this will eventually
be used as a purely additive guide in an upcoming new repack mode.
Explicitly avoid documenting this new flag, since it is only used
internally. In theory we could avoid even adding it rev-list, but being
able to spell this option out on the command-line makes some special
cases easier to test without promising to keep it behaving consistently
forever. Those tricky cases are exercised in t6114.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Future callers will want a function to fill a 'struct pack_entry' for a
given object id but _only_ from its position in any kept pack(s).
In particular, an new 'git repack' mode which ensures the resulting
packs form a geometric progress by object count will mark packs that it
does not want to repack as "kept in-core", and it will want to halt a
reachability traversal as soon as it visits an object in any of the kept
packs. But, it does not want to halt the traversal at non-kept, or
.keep packs.
The obvious alternative is 'find_pack_entry()', but this doesn't quite
suffice since it only returns the first pack it finds, which may or may
not be kept (and the mru cache makes it unpredictable which one you'll
get if there are options).
Short of that, you could walk over all packs looking for the object in
each one, but it scales with the number of packs, which may be
prohibitive.
Introduce 'find_kept_pack_entry()', a function which is like
'find_pack_entry()', but only fills in objects in the kept packs.
Handle packs which have .keep files, as well as in-core kept packs
separately, since certain callers will want to distinguish one from the
other. (Though on-disk and in-core kept packs share the adjective
"kept", it is best to think of the two sets as independent.)
There is a gotcha when looking up objects that are duplicated in kept
and non-kept packs, particularly when the MIDX stores the non-kept
version and the caller asked for kept objects only. This could be
resolved by teaching the MIDX to resolve duplicates by always favoring
the kept pack (if one exists), but this breaks an assumption in existing
MIDXs, and so it would require a format change.
The benefit to changing the MIDX in this way is marginal, so we instead
have a more thorough check here which is explained with a comment.
Callers will be added in subsequent patches.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the definitions of the pcre2_{malloc,free} functions above the
compile_pcre2_pattern() function they're used in.
Before the preceding commit they used to be needed earlier, but now we
can move them to be adjacent to the other PCREv2 functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the setup of the "pcre2_general_context" to happen per-thread
in compile_pcre2_pattern() instead of in grep_init().
This change brings it in line with how the rest of the pcre2_* members
in the grep_pat structure are set up.
As noted in the preceding commit the approach 513f2b0bbd (grep: make
PCRE2 aware of custom allocator, 2019-10-16) took to allocate the
pcre2_general_context seems to have been initially based on a
misunderstanding of how PCREv2 memory allocation works.
The approach of creating a global context in grep_init() is just added
complexity for almost zero gain. On my system it's 24 bytes saved
per-thread. For comparison PCREv2 will then go on to allocate at least
a kilobyte for its own thread-local state.
As noted in 6d423dd542 (grep: don't redundantly compile throwaway
patterns under threading, 2017-05-25) the grep code is intentionally
not trying to micro-optimize allocations by e.g. sharing some PCREv2
structures globally, while making others thread-local.
So let's remove this special case and make all of them thread-local
again for simplicity. With this change we could move the
pcre2_{malloc,free} functions around to live closer to their current
use. I'm not doing that here to keep this change small, that cleanup
will be done in a follow-up commit.
See also the discussion in 94da9193a6 (grep: add support for PCRE v2,
2017-06-01) about thread safety, and Johannes's comments[1] to the
effect that we should be doing what this patch is doing.
1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue work started in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}().
functions for allocation. We'll now use it for all PCREv2 allocations.
The reason 513f2b0bbd worked as a bugfix for the USE_NED_ALLOCATOR
issue is because it targeted the allocation freed via free(), as
opposed to by a pcre2_*free() function. I.e. the pcre2_maketables()
and pcre2_maketables_free() pair.
For most of the rest we continued allocating with stock malloc()
inside PCREv2 itself, but didn't segfault because we'd use its
corresponding free().
In a preceding commit of mine I changed the free() to
pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as
far as fixing the segfault goes we could revert 513f2b0bbd. But then
we wouldn't use the desired allocator, let's just use it instead.
Before this patch we'd on e.g.:
grep --threads=1 -iP æ.*var.*xyz
Only use pcre2_{malloc,free}() for 2 malloc() calls and 2
corresponding free() calls. Now it's 12 calls to each. This can be
observed with the GREP_PCRE2_DEBUG_MALLOC debug mode.
Reading the history of how this bug got introduced it wasn't present
in Johannes's original patch[1] to fix the issue.
My reading of that thread is that the approach the follow-up patches
to Johannes's original pursued were based on misunderstanding of how
the PCREv2 API works. In particular this part of [2]:
"most of the time (like when using UTF-8) the chartable (and
therefore the global context) is not needed (even when using
alternate allocators)"
That's simply not how PCREv2 memory allocation works. It's easy to see
how the misunderstanding came about. It's because (as noted above) the
issue was noticed because of our use of free() in our own grep.c for
freeing the memory allocated by pcre2_maketables().
Thus the misunderstanding that PCREv2's compile context is something
only needed for pcre2_maketables(), and e.g. an aborted earlier
attempt[3] to only set it up when we ourselves called
pcre2_maketables().
That's not what PCREv2's compile context is. To quote PCREv2's
documentation:
"This context just contains pointers to (and data for) external
memory management functions that are called from several places in
the PCRE2 library."
Thus the failed attempts to go down the route of only creating the
general context in cases where we ourselves call pcre2_maketables(),
before finally settling on the approach 513f2b0bbd took of always
creating it, but then mostly not using it.
Instead we should always create it, and then pass the general context
to those functions that accept it, so that they'll consistently use
our preferred memory allocation functions.
1. https://lore.kernel.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/
3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make use of the pcre2_maketables_free() function to free the memory
allocated by pcre2_maketables().
At first sight it's strange that 10da030ab7 (grep: avoid leak of
chartables in PCRE2, 2019-10-16) which added the free() call here
doesn't make use of the pcre2_free() the author introduced in the
preceding commit in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16).
The reason is that at the time the function didn't exist. It was first
introduced in PCREv2 version 10.34, released on 2019-11-21.
Let's make use of it behind a macro. I don't think this matters for
anything to do with custom allocators, but it makes our use of PCREv2
more discoverable.
At some distant point in the future we'll be able to drop the version
guard, as nobody will be running a version older than 10.34.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added
in 95ca1f987e (grep/pcre2: better support invalid UTF-8 haystacks,
2021-01-24) with the same test done at compile-time.
It might be cuter to do this at runtime since we don't have to do the
"major >= 11 || (major >= 10 && ...)" test. But in the next commit
we'll add another version comparison that absolutely needs to be done
at compile-time, so we're better of being consistent across the board.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add optional printing of PCREv2 allocations to stderr for a developer
who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1".
You need to manually change the definition in the source file similar
to the DEBUG_MAILMAP, there's no Makefile knob for this.
This will be referenced a subsequent commit, and is generally useful
to manually see what's going on with PCREv2 allocations while working
on that code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change pcre2_malloc() in a way that'll make it easier for a debugging
fprintf() to spew out the allocated pointer.
This doesn't introduce any functional change, it just makes a
subsequent commit's diff easier to read. Changes code added in
513f2b0bbd (grep: make PCRE2 aware of custom allocator, 2019-10-16).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a redundant assignment of pcre2_compile_context dating back to
my 94da9193a6 (grep: add support for PCRE v2, 2017-06-01).
In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no
need to NULL out individual members here.
I think this was probably something left over from an earlier
development version of mine.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop an assignment added in b65abcafc7 (grep: use PCRE v2 for
optimized fixed-string search, 2019-07-01) and the overly cautious
assert() I added in 94da9193a6 (grep: add support for PCRE v2,
2017-06-01).
There was never a good reason for this, it's just a relic from when I
initially wrote the PCREv2 support. We're not going to have confusion
about compile_pcre2_pattern() being called when it shouldn't just
because we forgot to cargo-cult this opt->pcre2 option.
Furthermore the "struct grep_opt" is (mostly) used for the options the
user supplied, let's avoid the pattern of needlessly assigning to it.
With my recent removal of the PCREv1 backend in 7599730b7e (Remove
support for v1 of the PCRE library, 2021-01-24) there's even less
confusion around what we call where in these codepaths, which is one
more reason to remove this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow restricting the tags used by the placeholder %(describe) with the
options match and exclude. E.g. the following command describes the
current commit using official version tags, without those for release
candidates:
$ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a format placeholder for describe output. Implement it by actually
calling git describe, which is simple and guarantees correctness. It's
intended to be used with $Format:...$ in files with the attribute
export-subst and git archive. It can also be used with git log etc.,
even though that's going to be slow due to the fork for each commit.
Suggested-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with .gitattributes and .gitignore, we would like to make sure that
.mailmap files are handled consistently whether read from the a blob (as
is the default behavior in a bare repo) or from the filesystem.
Likewise, we would like to avoid reading out-of-tree files pointed to by
a symlink, which could have security implications in certain setups.
We can cover both by using open_nofollow() when opening the in-tree
files. We'll continue to follow links for mailmap.file, as well as when
reading .mailmap from the current directory when outside of a repository
entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with .gitattributes, we would like to make sure that .gitignore files
are handled consistently whether read from the index or from the
filesystem. Likewise, we would like to avoid reading out-of-tree files
pointed to by the symlinks, which could have security implications in
certain setups.
We can cover both by using open_nofollow() when opening the in-tree
files. We'll continue to follow links for core.excludesFile, as well as
$GIT_DIR/info/exclude.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The attributes system may sometimes read in-tree files from the
filesystem, and sometimes from the index. In the latter case, we do not
resolve symbolic links (and are not likely to ever start doing so).
Let's open filesystem links with O_NOFOLLOW so that the two cases behave
consistently.
As a bonus, this means that git will not follow such symlinks to read
and parse out-of-tree paths. In some cases this could have security
implications, as a malicious repository can cause Git to open and read
arbitrary files. It could already feed arbitrary content to the parser,
but in certain setups it might be able to exfiltrate data from those
paths (e.g., if an automated service operating on the malicious repo
reveals its stderr to an attacker).
Note that O_NOFOLLOW only prevents following links for the path itself,
not intermediate directories in the path. At first glance, it seems
like
ln -s /some/path in-repo
might still look at "in-repo/.gitattributes", following the symlink to
"/some/path/.gitattributes". However, if "in-repo" is a symbolic link,
then we know that it has no git paths below it, and will never look at
its .gitattributes file.
We will continue to support out-of-tree symbolic links (e.g., in
$GIT_DIR/info/attributes); this just affects in-tree links. When a
symbolic link is encountered, the contents are ignored and a warning is
printed. POSIX specifies ELOOP in this case, so the user would generally
see something like:
warning: unable to access '.gitattributes': Too many levels of symbolic links
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a number of callers of add_patterns() and its sibling
functions. Let's give them a "flags" parameter for adding new options
without having to touch each caller. We'll use this in a future patch to
add O_NOFOLLOW support. But for now each caller just passes 0.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The attribute code can have a rather deep callstack, through
which we have to pass the "macro_ok" flag. In anticipation
of adding other flags, let's convert this to a generic
bit-field.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of open() would like to use O_NOFOLLOW, but it is not
available on all platforms. Let's abstract this into a helper function
so we can provide system-specific implementations.
Some light web-searching reveals that we might be able to get something
similar on Windows using FILE_FLAG_OPEN_REPARSE_POINT. I didn't dig into
this further.
For other systems without O_NOFOLLOW or any equivalent, we have two
options for fallback:
- we can just open anyway, following symlinks; this may have security
implications (e.g., following untrusted in-tree symlinks)
- we can determine whether the path is a symlink with lstat().
This is slower (two syscalls instead of one), but that may be
acceptable for infrequent uses like looking up .gitattributes files
(especially because we can get away with a single syscall for the
common case of ENOENT).
It's also racy, but should be sufficient for our needs (we are
worried about in-tree symlinks that we ourselves would have
previously created). We could make it non-racy at the cost of making
it even slower, by doing an fstat() on the opened descriptor and
comparing the dev/ino fields to the original lstat().
This patch implements the lstat() option in its slightly-faster racy
form.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the test, FAKE_COMMIT_MESSAGE replaces the commit message each
time it is invoked so there will be only one instance of "Modified-A3"
no matter how many times we invoke the editor. Let's fix this and use
FAKE_COMMIT_AMEND instead so that it adds "Modified-A3" once for each
time the editor is invoked.
This patch also removes the check for counting the number of
"Modified-A3" lines and instead compares the whole message to check
that the commenting code works correctly for 'fixup -c' as well as
'fixup -C'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the named commits in the tests so that they will still refer to the
same commit if the setup gets changed in the future whereas 'branch~2'
will change which commit it points to.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's simplify the test_commit_message() helper function and add
comments to the function.
This patch also document the working of 'fixup -C' with "amend!" in the
test-description.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As it is currently implemented, it's too difficult to follow along and
remember the value of "expected-message" from test to test. It also
makes it difficult to extend tests or add new tests in between existing
tests without negatively impacting other tests.
Let's set up "expected-message" to the precise content needed by the
test, so that both the problems go away and also makes easier to run
tests selectively with '--run' or 'GIT_SKIP_TESTS'
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The most common way to format here-docs in Git test scripts is for the
body and EOF to be indented the same amount as the command which opened
the here-doc. Fix a few here-docs in this script to conform to that
standard and also remove the unnecessary curly braces.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `-c` says "edit the commit message" it's not clear what will be
edited. The original's commit message or the replacement's message or a
combination of the two. Word it such that it states more precisely what
exactly will be edited. While at it, also drop the jarring period and
capitalization, neither of which is otherwise present in the message.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the commit subject starts with "amend!" then rearrange it like a
"fixup!" commit and replace `pick` command with `fixup -C` command,
which is used to fixup up the content if any and replaces the original
commit message with amend! commit's message.
Original-patch-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add options to `fixup` command to fixup both the commit contents and
message. `fixup -C` command is used to replace the original commit
message and `fixup -c`, additionally allows to edit the commit message.
Original-patch-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When squashing commit messages the squash!/fixup! subjects are not of
interest so comment them out to stop them becoming part of the final
message.
This change breaks a bunch of --autosquash tests which rely on the
"squash! <subject>" line appearing in the final commit message. This is
addressed by adding a second line to the commit message of the "squash!
..." commits and testing for that.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file "$GIT_DIR/rebase-merge/fixup-message" is only used for fixup
commands, there's no point in writing it for squash commands as it is
immediately deleted.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 17:50:11 -08:00
818 changed files with 100293 additions and 74103 deletions
@ -27,12 +27,11 @@ precedence, the last matching pattern decides the outcome):
them.
* Patterns read from a `.gitignore` file in the same directory
as the path, or in any parent directory, with patterns in the
higher level files (up to the toplevel of the work tree) being overridden
by those in lower level files down to the directory containing the file.
These patterns match relative to the location of the
`.gitignore` file. A project normally includes such
`.gitignore` files in its repository, containing patterns for
as the path, or in any parent directory (up to the top-level of the working
tree), with patterns in the higher level files being overridden by those in
lower level files down to the directory containing the file. These patterns
match relative to the location of the `.gitignore` file. A project normally
includes such `.gitignore` files in its repository, containing patterns for
files generated as part of the project build.
* Patterns read from `$GIT_DIR/info/exclude`.
@ -149,11 +148,15 @@ not tracked by Git remain untracked.
To stop tracking a file that is currently tracked, use
'git rm --cached'.
Git does not follow symbolic links when accessing a `.gitignore` file in
the working tree. This keeps behavior consistent when the file is
accessed from the index or a tree versus from the filesystem.
EXAMPLES
--------
- The pattern `hello.*` matches any file or folder
whose name begins with `hello`. If one wants to restrict
whose name begins with `hello.`. If one wants to restrict
this only to the directory and not in its subdirectories,
one can prepend the pattern with a slash, i.e. `/hello.*`;
the pattern now matches `hello.txt`, `hello.c` but not
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.