mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Csókás, Bence	c852531f45	git-send-email: use sanitized address when reading mbox body Addresses that are mentioned on the trailers in the commit log messages (e.g., "Reviewed-by") are added to the "Cc:" list by "git send-email". These hand-written addresses, however, may be malformed (e.g., having unquoted "." and other punctutation marks in the display-name part) and can upset MTA. The code does use the sanitize_address() helper on these address-looking strings to turn them into valid addresses, but it is used only to see if the address should be suppressed. The original string taken from the message is added to the @cc list if the code decides the address is not suppressed. Because the addresses on trailer lines are hand-written and more likely to contain malformed addresses, when adding to the @cc list, use the result from sanitize_address, not the original. Note that we do not modify the behaviour for addresses taken from the e-mail headers, as they are more likely to be machine generated and well-formed. Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-07-01 11:38:29 -07:00
Orgad Shaneh	f402c7941f	git-gui: fix inability to quit after closing another instance If you open 2 git gui instances in the same directory, then close one of them and try to close the other, an error message pops up, saying: 'error renaming ".git/GITGUI_BCK": no such file or directory', and it is no longer possible to close the window ever. Fix by catching this error, and proceeding even if the file no longer exists. Signed-off-by: Orgad Shaneh <orgads@gmail.com>	2024-06-30 09:15:04 +03:00
Junio C Hamano	790a17fb19	Sync with 'maint'	2024-06-28 16:03:59 -07:00
Junio C Hamano	09e5e7f718	More post 2.45.2 updates from the 'master' front Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 15:53:19 -07:00
Junio C Hamano	5d5675515e	Merge branch 'ds/ahead-behind-fix' into maint-2.45 Fix for a progress bar. * ds/ahead-behind-fix: commit-graph: increment progress indicator	2024-06-28 15:53:19 -07:00
Junio C Hamano	112bd6a67c	Merge branch 'ds/doc-add-interactive-singlekey' into maint-2.45 Doc update. * ds/doc-add-interactive-singlekey: doc: interactive.singleKey is disabled by default	2024-06-28 15:53:18 -07:00
Junio C Hamano	ce359a4dcc	Merge branch 'jc/varargs-attributes' into maint-2.45 Varargs functions that are unannotated as printf-like or execl-like have been annotated as such. * jc/varargs-attributes: __attribute__: add a few missing format attributes __attribute__: mark some functions with LAST_ARG_MUST_BE_NULL __attribute__: remove redundant attribute declaration for git_die_config() __attribute__: trace2_region_enter_printf() is like "printf"	2024-06-28 15:53:18 -07:00
Junio C Hamano	b2a62b6a42	Merge branch 'ps/ci-fix-detection-of-ubuntu-20' into maint-2.45 Fix for an embarrassing typo that prevented Python2 tests from running anywhere. * ps/ci-fix-detection-of-ubuntu-20: ci: fix check for Ubuntu 20.04	2024-06-28 15:53:17 -07:00
Junio C Hamano	f30e5332e4	Merge branch 'jk/cap-exclude-file-size' into maint-2.45 An overly large ".gitignore" files are now rejected silently. * jk/cap-exclude-file-size: dir.c: reduce max pattern file size to 100MB dir.c: skip .gitignore, etc larger than INT_MAX	2024-06-28 15:53:17 -07:00
Junio C Hamano	ce75d32b99	Merge branch 'jc/safe-directory-leading-path' into maint-2.45 The safe.directory configuration knob has been updated to optionally allow leading path matches. * jc/safe-directory-leading-path: safe.directory: allow "lead/ing/path/*" match	2024-06-28 15:53:17 -07:00
Junio C Hamano	7b7db54b83	Merge branch 'rs/difftool-env-simplify' into maint-2.45 Code simplification. * rs/difftool-env-simplify: difftool: add env vars directly in run_file_diff()	2024-06-28 15:53:16 -07:00
Junio C Hamano	6e3eb346ed	Merge branch 'ps/fix-reinit-includeif-onbranch' into maint-2.45 "git init" in an already created directory, when the user configuration has includeif.onbranch, started to fail recently, which has been corrected. * ps/fix-reinit-includeif-onbranch: setup: fix bug with "includeIf.onbranch" when initializing dir	2024-06-28 15:53:16 -07:00
Junio C Hamano	903b4da27f	Merge branch 'es/chainlint-ncores-fix' into maint-2.45 The chainlint script (invoked during "make test") did nothing when it failed to detect the number of available CPUs. It now falls back to 1 CPU to avoid the problem. * es/chainlint-ncores-fix: chainlint.pl: latch CPU count directly reported by /proc/cpuinfo chainlint.pl: fix incorrect CPU count on Linux SPARC chainlint.pl: make CPU count computation more robust	2024-06-28 15:53:15 -07:00
Junio C Hamano	2988b82b87	Merge branch 'jc/rev-parse-fatal-doc' into maint-2.45 Doc update. * jc/rev-parse-fatal-doc: rev-parse: document how --is-* options work outside a repository	2024-06-28 15:53:14 -07:00
Junio C Hamano	0d56a5946a	Merge branch 'jc/doc-diff-name-only' into maint-2.45 The documentation for "git diff --name-only" has been clarified that it is about showing the names in the post-image tree. * jc/doc-diff-name-only: diff: document what --name-only shows	2024-06-28 15:53:14 -07:00
Junio C Hamano	db9d38d9bb	Merge branch 'mt/t0211-typofix' into maint-2.45 Test fix. * mt/t0211-typofix: t/t0211-trace2-perf.sh: fix typo patern -> pattern	2024-06-28 15:53:13 -07:00
Junio C Hamano	db15f4d794	Merge branch 'dg/fetch-pack-code-cleanup' into maint-2.45 Code clean-up to remove an unused struct definition. * dg/fetch-pack-code-cleanup: fetch-pack: remove unused 'struct loose_object_iter'	2024-06-28 15:53:13 -07:00
Junio C Hamano	0c6c514c50	Merge branch 'dm/update-index-doc-fix' into maint-2.45 Doc fix. * dm/update-index-doc-fix: documentation: git-update-index: add --show-index-version to synopsis	2024-06-28 15:53:12 -07:00
Junio C Hamano	b608b33f3d	Merge branch 'ds/scalar-reconfigure-all-fix' into maint-2.45 Scalar fix. * ds/scalar-reconfigure-all-fix: scalar: avoid segfault in reconfigure --all	2024-06-28 15:53:12 -07:00
Junio C Hamano	abfdc596d8	Merge branch 'vd/doc-merge-tree-x-option' into maint-2.45 Doc update. * vd/doc-merge-tree-x-option: Documentation/git-merge-tree.txt: document -X	2024-06-28 15:53:11 -07:00
Junio C Hamano	0d23421e2a	Merge branch 'fa/p4-error' into maint-2.45 P4 update. * fa/p4-error: git-p4: show Perforce error to the user	2024-06-28 15:53:11 -07:00
Junio C Hamano	6840423c6f	Merge branch 'tb/attr-limits' into maint-2.45 The maximum size of attribute files is enforced more consistently. * tb/attr-limits: attr.c: move ATTR_MAX_FILE_SIZE check into read_attr_from_buf()	2024-06-28 15:53:10 -07:00
Junio C Hamano	a5adab9b16	Merge branch 'rs/diff-parseopts-cleanup' into maint-2.45 Code clean-up to remove code that is now a noop. * rs/diff-parseopts-cleanup: diff-lib: stop calling diff_setup_done() in do_diff_cache()	2024-06-28 15:53:10 -07:00
Junio C Hamano	fc636b413b	Merge branch 'dk/zsh-git-repo-path-fix' into maint-2.45 Command line completion support for zsh (in contrib/) has been updated to stop exposing internal state to end-user shell interaction. * dk/zsh-git-repo-path-fix: completion: zsh: stop leaking local cache variable	2024-06-28 15:53:09 -07:00
Junio C Hamano	079323dc6d	Merge branch 'bc/zsh-compatibility' into maint-2.45 zsh can pretend to be a normal shell pretty well except for some glitches that we tickle in some of our scripts. Work them around so that "vimdiff" and our test suite works well enough with it. * bc/zsh-compatibility: vimdiff: make script and tests work with zsh t4046: avoid continue in &&-chain for zsh	2024-06-28 15:53:09 -07:00
Junio C Hamano	1b1b4d490d	Merge branch 'js/for-each-repo-keep-going' into maint-2.45 A scheduled "git maintenance" job is expected to work on all repositories it knows about, but it stopped at the first one that errored out. Now it keeps going. * js/for-each-repo-keep-going: maintenance: running maintenance should not stop on errors for-each-repo: optionally keep going on an error	2024-06-28 15:53:08 -07:00
Junio C Hamano	2a78de0d9f	Merge branch 'aj/stash-staged-fix' into maint-2.45 "git stash -S" did not handle binary files correctly, which has been corrected. * aj/stash-staged-fix: stash: fix "--staged" with binary files	2024-06-28 15:53:07 -07:00
Junio C Hamano	a41463e437	Merge branch 'xx/disable-replace-when-building-midx' into maint-2.45 The procedure to build multi-pack-index got confused by the replace-refs mechanism, which has been corrected by disabling the latter. * xx/disable-replace-when-building-midx: midx: disable replace objects	2024-06-28 15:53:07 -07:00
Junio C Hamano	332bcf74ea	Merge branch 'pw/rebase-m-signoff-fix' into maint-2.45 "git rebase --signoff" used to forget that it needs to add a sign-off to the resulting commit when told to continue after a conflict stops its operation. * pw/rebase-m-signoff-fix: rebase -m: fix --signoff with conflicts sequencer: store commit message in private context sequencer: move current fixups to private context sequencer: start removing private fields from public API sequencer: always free "struct replay_opts"	2024-06-28 15:53:06 -07:00
Derrick Stolee	114bff72ac	sparse-index: improve lstat caching of sparse paths The clear_skip_worktree_from_present_files() method was first introduced in `af6a51875a` (repo_read_index: clear SKIP_WORKTREE bit from files present in worktree, 2022-01-14) to allow better interaction with the working directory in the presence of paths outside of the sparse-checkout. The initial implementation would lstat() every single SKIP_WORKTREE path to see if it existed; if it ran across a sparse directory that existed (when a sparse index was in use), then it would expand the index and then check every SKIP_WORKTREE path. Since these lstat() calls were very expensive, this was improved in `d79d299352` (Accelerate clear_skip_worktree_from_present_files() by caching, 2022-01-14) by caching directories that do not exist so it could avoid lstat()ing any files under such directories. However, there are some inefficiencies in that caching mechanism. The caching mechanism stored only the parent directory as not existing, even if a higher parent directory also does not exist. This means that wasted lstat() calls would occur when the paths passed to path_found() change immediate parent directories but within the same parent directory that does not exist. To create an example repository that demonstrates this problem, it helps to have a directory outside of the sparse-checkout that contains many deep paths. In particular, the first paths (in lexicographic order) underneath the sparse directory should have deep directory structures, maximizing the difference between the old caching algorithm that looks to a single parent and the new caching algorithm that looks to the top-most missing directory. The performance test script p2000-sparse-operations.sh takes the sample repository and copies its HEAD to several copies nested in directories of the form f<i>/f<j>/f<k> where i, j, and k are numbers from 1 to 4. The sparse-checkout cone is then selected as "f2/f4/". Creating "f1/f1/" will trigger the behavior and also lead to some interesting cases for the caching algorithm since "f1/f1/" exists but "f1/f2/" and "f3/" do not. This is difficult to notice when running performance tests using the Git repository (or a blow-up of the Git repository, as in p2000-sparse-operations.sh) because Git has a very shallow directory structure. This change reorganizes the caching algorithm to focus on storing the highest level leading directory that does not exist; specifically this means that that directory's parent _does_ exist. By doing a little extra work on a path passed to path_found(), we can short-circuit all of the paths passed to path_found() afterwards that match a prefix with that non-existing directory. When in a repository where the first sparse file is likely to have a much deeper path than the first non-existing directory, this can realize significant gains. The details of this algorithm require careful attention, so the new implementation of path_found() has detailed comments, including the use of a new max_common_dir_prefix() method that may be of independent interest. It's worth noting that this is not universally positive, since we are doing extra lstat() calls to establish the exact path to cache. In the blow-up of the Git repository, we can see that the lstat count _increases_ from 28 to 31. However, these numbers were already artificially low. Contributor Elijah Newren created a publicly-available test repository that demonstrates the difference in these caching algorithms in the most extreme way. To test, follow these steps: git clone --sparse https://github.com/newren/gvfs-like-git-bomb cd gvfs-like-git-bomb ./runme.sh # NOTE: check scripts before running! At this point, assuming you do not have index.sparse=true set globally, the index has one million paths with the SKIP_WORKTREE bit and they will all be sent to path_found() in the sparse loop. You can measure this by running 'git status' with GIT_TRACE2_PERF=1: Sparse files in the index: 1,000,000 sparse_lstat_count (before): 200,000 sparse_lstat_count (after): 2 And here are the performance numbers: Benchmark 1: old Time (mean ± σ): 397.5 ms ± 4.1 ms Range (min … max): 391.2 ms … 404.8 ms 10 runs Benchmark 2: new Time (mean ± σ): 252.7 ms ± 3.1 ms Range (min … max): 249.4 ms … 259.5 ms 11 runs Summary 'new' ran 1.57 ± 0.02 times faster than 'old' By modifying this example further, we can demonstrate a more realistic example and include the sparse index expansion. Continue by creating this directory, confusing both caching algorithms somewhat: mkdir -p bomb/d/e/f/a/a Then re-run the 'git status' tests to see these statistics: Sparse files in the index: 1,000,000 sparse_lstat_count (before): 724,010 sparse_lstat_count (after): 106 Benchmark 1: old Time (mean ± σ): 753.0 ms ± 3.5 ms Range (min … max): 749.7 ms … 760.9 ms 10 runs Benchmark 2: new Time (mean ± σ): 201.4 ms ± 3.2 ms Range (min … max): 196.0 ms … 207.9 ms 14 runs Summary 'new' ran 3.74 ± 0.06 times faster than 'old' Note that if this repository had a sparse index enabled, the additional cost of expanding the sparse index affects the total time of these commands by over four seconds, significantly diminishing the benefit of the caching algorithm. Having existing paths outside of the sparse-checkout is a known performance issue for the sparse index and is a known trade-off for the performance benefits given when no such paths exist. Using an internal monorepo with over two million paths at HEAD and a typical sparse-checkout cone such that the sparse index contains ~190,000 entries (including over two thousand sparse trees), I was able to measure these lstat counts when one sparse directory actually exists on disk: Sparse files in expanded index: 1,841,997 full_lstat_count (before): 1,188,161 full_lstat_count (after): 4,404 This resulted in this absolute time change, on a warm disk: Time in full loop (before): 13.481 s Time in full loop (after): 0.081 s (These times were calculated on a Windows machine, where lstat() is slower than a similar Linux machine.) Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 12:32:12 -07:00
Derrick Stolee	c4e8c42c19	sparse-index: count lstat() calls The clear_skip_worktree.. methods already report some statistics about how many cache entries are checked against path_found() due to having the skip-worktree bit set. However, due to path_found() performing some caching, this isn't the only information that would be helpful to report. Add a new lstat_count member to the path_found_data struct to count the number of times path_found() calls lstat(). This will be helpful to help explain performance problems in this method as well as to demonstrate future changes to the caching algorithm in a more concrete way than end-to-end timings. Signed-off-by: Derrick Stolee <stolee@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 12:32:12 -07:00
Derrick Stolee	23dd6f8bcc	sparse-index: use strbuf in path_found() The path_found() method previously reused strings from the cache entries the calling methods were using. This prevents string manipulation in place and causes some odd reallocation before the final lstat() call in the method. Refactor the method to use strbufs and copy the path into the strbuf, but also only the parent directory and not the whole path. This looks like extra copying when assigning the path to the strbuf, but we save an allocation by dropping the 'tmp' string, and we are "reusing" the copy from 'tmp' to put the data in the strbuf. Signed-off-by: Derrick Stolee <stolee@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 12:32:11 -07:00
Derrick Stolee	b746a85d9a	sparse-index: refactor path_found() In advance of changing the behavior of path_found(), take all of the intermediate data values and group them into a single struct. This simplifies the method prototype as well as the initialization. Future changes can be made directly to the struct and method without changing the callers with this approach. Note that the clear_path_found_data() method is currently empty, as there is nothing to free. This method is a placeholder for future changes that require a non-trivial implementation. Its stub is created now so consumers could call it now and not change in future changes. Signed-off-by: Derrick Stolee <stolee@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 12:32:11 -07:00
Derrick Stolee	532e216986	sparse-checkout: refactor skip worktree retry logic The clear_skip_worktree_from_present_files() method was introduced in `af6a51875a` (repo_read_index: clear SKIP_WORKTREE bit from files present in worktree, 2022-01-14) to help cases where sparse-checkout is enabled but some paths outside of the sparse-checkout also exist on disk. This operation can be slow as it needs to check path existence in a way not stored in the index, so caching was introduced in `d79d299352` (Accelerate clear_skip_worktree_from_present_files() by caching, 2022-01-14). This check is particularly confusing in the presence of a sparse index, as a sparse tree entry corresponding to an existing directory must first be expanded to a full index before examining the paths within. This is currently implemented using a 'goto' and a boolean variable to ensure we restart only once. Even with that caching, it was noticed that this could take a long time to execute. `89aaab11a3` (index: add trace2 region for clear skip worktree, 2022-11-03) introduced trace2 regions to measure this time. Further, the way the loop repeats itself was slightly confusing and prone to breakage, so a BUG() statement was added in `8c7abdc596` (index: raise a bug if the index is materialised more than once, 2022-11-03) to be sure that the second run of the loop does not hit any sparse trees. One thing that can be confusing about the current setup is that the trace2 regions nest and it is not clear that a second loop is running after a sparse index is expanded. Here is an example of what the regions look like in a typical case: \| region_enter \| ... \| label:clear_skip_worktree_from_present_files \| region_enter \| ... \| ..label:update \| region_leave \| ... \| ..label:update \| region_enter \| ... \| ..label:ensure_full_index \| region_enter \| ... \| ....label:update \| region_leave \| ... \| ....label:update \| region_leave \| ... \| ..label:ensure_full_index \| data \| ... \| ..sparse_path_count:1 \| data \| ... \| ..sparse_path_count_full:269538 \| region_leave \| ... \| label:clear_skip_worktree_from_present_files One thing that is particularly difficult to understand about these regions is that most of the time is spent between the close of the ensure_full_index region and the reporting of the end data. This is because of the restart of the loop being within the same region as the first iteration of the loop. This change refactors the method into two separate methods that are traced separately. This will be more important later when we change other features of the methods, but for now the only functional change is the difference in the structure of the trace regions. After this change, the same telemetry section is split into three distinct chunks: \| region_enter \| ... \| label:clear_skip_worktree_from_present_files_sparse \| data \| ... \| ..sparse_path_count:1 \| region_leave \| ... \| label:clear_skip_worktree_from_present_files_sparse \| region_enter \| ... \| label:update \| region_leave \| ... \| label:update \| region_enter \| ... \| label:ensure_full_index \| region_enter \| ... \| ..label:update \| region_leave \| ... \| ..label:update \| region_leave \| ... \| label:ensure_full_index \| region_enter \| ... \| label:clear_skip_worktree_from_present_files_full \| data \| ... \| ..full_path_count:269538 \| region_leave \| ... \| label:clear_skip_worktree_from_present_files_full Here, we see the sparse loop terminating early with its first sparse path being a sparse directory containing a file. Then, that loop's region terminates before ensure_full_index begins (in this case, the cache-tree must also be computed). Then, _after_ the index is expanded, the full loop begins with its own region. Signed-off-by: Derrick Stolee <stolee@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-28 12:32:10 -07:00
Junio C Hamano	daed0c68e9	The seventeenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-27 09:20:00 -07:00
Junio C Hamano	b781a3e08e	Merge branch 'jk/fetch-pack-fsck-wo-lock-pack' "git fetch-pack -k -k" without passing "--lock-pack" (which we never do ourselves) did not work at all, which has been corrected. * jk/fetch-pack-fsck-wo-lock-pack: fetch-pack: fix segfault when fscking without --lock-pack	2024-06-27 09:19:59 -07:00
Junio C Hamano	5dce36e04f	Merge branch 'rs/remove-unused-find-header-mem' Code clean-up. * rs/remove-unused-find-header-mem: commit: remove find_header_mem()	2024-06-27 09:19:59 -07:00
Junio C Hamano	b8d1a1b06c	Merge branch 'jk/t5500-typofix' A helper function shared between two tests had a copy-paste bug, which has been corrected. * jk/t5500-typofix: t5500: fix mistaken $SERVER reference in helper function	2024-06-27 09:19:59 -07:00
Junio C Hamano	424a13db64	Merge branch 'js/mingw-remove-unused-extern-decl' An unused extern declaration for mingw has been removed to prevent it from causing build failure. * js/mingw-remove-unused-extern-decl: mingw: drop bogus (and unneeded) declaration of `_pgmptr`	2024-06-27 09:19:58 -07:00
Junio C Hamano	6c0bfce914	Merge branch 'kz/merge-fail-early-upon-refresh-failure' When "git merge" sees that the index cannot be refreshed (e.g. due to another process doing the same in the background), it died but after writing MERGE_HEAD etc. files, which was useless for the purpose to recover from the failure. * kz/merge-fail-early-upon-refresh-failure: merge: avoid write merge state when unable to write index	2024-06-27 09:19:58 -07:00
Jeff King	407cdbd271	t/lib-bundle-uri: use local fake bundle URLs A few of the bundle URI tests point config at a fake bundle; they care only that the client has been configured with _some_ bundle, but it doesn't have to actually contain objects. For the file:// tests, we use "$BUNDLE_URI_REPO_URI/fake.bdl", a non-existent file inside the actual remote repo. But for git:// and http:// tests, we use "https://example.com/fake.bdl". This works OK in practice, but it means we actually make a request to example.com (which returns a placeholder HTML response). That can be annoying when running the test suite on a spotty network (it doesn't produce a wrong result, since we expect it to fail, but it may introduce delays). We can reduce our dependency on the outside world by using a local URL. It would work to just do "file://$PWD/fake.bdl" here, since the bundle code does not care about the actual location. But in the long run I suspect we may have more restrictions on which protocols can be passed around as bundle URIs. So instead, let's stick with the file:// repo's pattern and just point to a bogus name based on the remote repo's URL. For http this makes perfect sense; we'll make a request to the local http server and find that there's nothing there. For git:// it's a little weird, as you wouldn't normally access a bundle file over git:// at all. But it's probably the most reasonable guess we can make for now, and anybody who tightens protocol selection later will know better what's the best path forward. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-26 14:31:18 -07:00
Jeff King	e6653ec3c6	t5551: do not confirm that bogus url cannot be used t5551 tries to access a URL with a bogus hostname and confirms that http.curloptResolve lets us use this otherwise unresolvable name. Before doing so, though, we confirm that trying to access the bogus hostname without http.curloptResolve fails as expected. This isn't testing Git at all, but is confirming the test's assumptions. That's often a good thing to do, but in this case it means that we'll actually try to resolve the external name. Even though it's unlikely that "gitbogusexamplehost.invalid" would ever resolve, the DNS lookup itself may take time. It's probably reasonable to just assume that this obviously-bogus name would not actually resolve in practice, which lets us reduce our test suite's dependency on the outside world. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-26 14:31:18 -07:00
Jeff King	63ec97faf7	t5553: use local url for invalid fetch We test how "fetch --set-upstream" behaves when given an invalid URL, using the bogus URL "http://nosuchdomain.example.com". But finding out that it is invalid requires an actual DNS lookup. Reduce our dependency on external factors by using an invalid local filesystem URL, which works just as well for our purposes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-26 14:31:17 -07:00
Abhijeet Sonar	b8ae42e292	describe: refresh the index when 'broken' flag is used When describe is run with 'dirty' flag, we refresh the index to make sure it is in sync with the filesystem before determining if the working tree is dirty. However, this is not done for the codepath where the 'broken' flag is used. This causes `git describe --broken --dirty` to false positively report the worktree being dirty if a file has different stat info than what is recorded in the index. Running `git update-index -q --refresh` to refresh the index before running diff-index fixes the problem. Also add tests to deliberately update stat info of a file before running describe to verify it behaves correctly. Reported-by: Paul Millar <paul.millar@desy.de> Suggested-by: Junio C Hamano <gitster@pobox.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-26 13:04:08 -07:00
Junio C Hamano	72c282098d	archive: document that --add-virtual-file takes full path Tom Scogland noticed that `--add-virtual-file` option uses the path specified as its value as-is, without prepending any value given to the `--prefix` option like `--add-file` does. The behaviour has always been that way since the option was introduced, but the documentation has always been wrong and said that it would use the value of `--prefix` just like `--add-file` does. We could modify the behaviour to make it literally work like the documentation said, but it would break existing scripts the users use. Noticed-by: Tom Scogland <scogland1@llnl.gov> Acked-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-26 12:56:45 -07:00
Darcy Burke	9d69789770	date: detect underflow/overflow when parsing dates with timezone offset Overriding the date of a commit to be close to "1970-01-01 00:00:00" with a large enough positive timezone for the equivelant GMT time to be before the epoch is considered valid by `parse_date_basic`. Similar behaviour occurs when using a date close to "2099-12-31 23:59:59" (the maximum date allowed by `tm_to_time_t`) with a large enough negative timezone offset. This leads to an integer underflow or underflow respectively in the commit timestamp, which is not caught by `git-commit`, but will cause other services to fail, such as `git-fsck`, which, for the first case, reports "badDateOverflow: invalid author/committer line - date causes integer overflow". Instead check the timezone offset and fail if the resulting time comes before the epoch "1970-01-01T00:00:00Z" or after the maximum date "2099-12-31T23:59:59Z". Using the REQUIRE_64BIT_TIME prerequisite, make sure that the tests near the end of Git time (aka end of year 2099) are not attempted on purely 32-bit systems, as they cannot express timestamp beyond 2038 anyway. Signed-off-by: Darcy Burke <acednes@gmail.com> [jc: fixups for 32-bit platforms] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-25 17:07:41 -07:00
Junio C Hamano	a59275d5d6	t0006: simplify prerequisites The system must support 64-bit time and its time_t must be 64-bit wide to pass these tests. Combine these two prerequisites together to simplify the tests. In theory, they could be fulfilled independently and tests could require only one without the other, but in practice, these must come hand-in-hand. Update the "check_parse" test helper to pay attention to the REQUIRE_64BIT_TIME variable, which can be set to the HAVE_64BIT_TIME prerequisite so that a parse test can be skipped on 32-bit systems. This will be used in the next step to skip tests for timestamps near the end of year 2099, as 32-bit systems will not be able to express a timestamp beyond 2038 anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-25 17:07:26 -07:00
Taylor Blau	9c8a9ec787	bloom: introduce `deinit_bloom_filters()` After we are done using Bloom filters, we do not currently clean up any memory allocated by the commit slab used to store those filters in the first place. Besides the bloom_filter structures themselves, there is mostly nothing to free() in the first place, since in the read-only path all Bloom filter's `data` members point to a memory mapped region in the commit-graph file itself. But when generating Bloom filters from scratch (or initializing truncated filters) we allocate additional memory to store the filter's data. Keep track of when we need to free() this additional chunk of memory by using an extra pointer `to_free`. Most of the time this will be NULL (indicating that we are representing an existing Bloom filter stored in a memory mapped region). When it is non-NULL, free it before discarding the Bloom filters slab. Suggested-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-25 13:52:06 -07:00
Taylor Blau	5421e7c3a1	commit-graph: reuse existing Bloom filters where possible In an earlier commit, a bug was described where it's possible for Git to produce non-murmur3 hashes when the platform's "char" type is signed, and there are paths with characters whose highest bit is set (i.e. all characters >= 0x80). That patch allows the caller to control which version of Bloom filters are read and written. However, even on platforms with a signed "char" type, it is possible to reuse existing Bloom filters if and only if there are no changed paths in any commit's first parent tree-diff whose characters have their highest bit set. When this is the case, we can reuse the existing filter without having to compute a new one. This is done by marking trees which are known to have (or not have) any such paths. When a commit's root tree is verified to not have any such paths, we mark it as such and declare that the commit's Bloom filter is reusable. Note that this heuristic only goes in one direction. If neither a commit nor its first parent have any paths in their trees with non-ASCII characters, then we know for certain that a path with non-ASCII characters will not appear in a tree-diff against that commit's first parent. The reverse isn't necessarily true: just because the tree-diff doesn't contain any such paths does not imply that no such paths exist in either tree. So we end up recomputing some Bloom filters that we don't strictly have to (i.e. their bits are the same no matter which version of murmur3 we use). But culling these out is impossible, since we'd have to perform the full tree-diff, which is the same effort as computing the Bloom filter from scratch. But because we can cache our results in each tree's flag bits, we can often avoid recomputing many filters, thereby reducing the time it takes to run $ git commit-graph write --changed-paths --reachable when upgrading from v1 to v2 Bloom filters. To benchmark this, let's generate a commit-graph in linux.git with v1 changed-paths in generation order[^1]: $ git clone git@github.com:torvalds/linux.git $ cd linux $ git commit-graph write --reachable --changed-paths $ graph=".git/objects/info/commit-graph" $ mv $graph{,.bak} Then let's time how long it takes to go from v1 to v2 filters (with and without the upgrade path enabled), resetting the state of the commit-graph each time: $ git config commitGraph.changedPathsVersion 2 $ hyperfine -p 'cp -f $graph.bak $graph' -L v 0,1 \ 'GIT_TEST_UPGRADE_BLOOM_FILTERS={v} git.compile commit-graph write --reachable --changed-paths' On linux.git (where there aren't any non-ASCII paths), the timings indicate that this patch represents a speed-up over recomputing all Bloom filters from scratch: Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] Range (min … max): 124.621 s … 125.227 s 3 runs Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] Range (min … max): 79.112 s … 79.437 s 3 runs Summary 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' On git.git, we do have some non-ASCII paths, giving us a more modest improvement from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. On my machine, the stats for git.git are: - 8,285 Bloom filters computed from scratch - 10 Bloom filters generated as empty - 4 Bloom filters generated as truncated due to too many changed paths - 65,114 Bloom filters were reused when transitioning from v1 to v2. [^1]: Note that this is is important, since `--stdin-packs` or `--stdin-commits` orders commits in the commit-graph by their pack position (with `--stdin-packs`) or in the raw input (with `--stdin-commits`). Since we compute Bloom filters in the same order that commits appear in the graph, we must see a commit's (first) parent before we process the commit itself. This is only guaranteed to happen when sorting commits by their generation number. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-25 13:52:06 -07:00
Taylor Blau	df3df2dcf4	object.h: fix mis-aligned flag bits table Bit position 23 is one column too far to the left. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-25 13:52:06 -07:00

... 21 22 23 24 25 ...

75032 Commits