Commit Graph

1014 Commits

Author SHA1 Message Date
aae91a86fb Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.

* ds/name-hash-tweaks:
  pack-objects: prevent name hash version change
  test-tool: add helper for name-hash values
  p5313: add size comparison test
  pack-objects: add GIT_TEST_NAME_HASH_VERSION
  repack: add --name-hash-version option
  pack-objects: add --name-hash-version option
  pack-objects: create new name-hash function version
2025-02-12 10:08:51 -08:00
0578f1e66a global: adapt callers to use generic hash context helpers
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:11 -08:00
7346e340f1 hash: stop typedeffing the hash context
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.

Drop the typedef and adapt all callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:10 -08:00
0cbcba5455 Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanup
* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-01-31 10:05:46 -08:00
f046ab2dd4 Merge branch 'ds/path-walk-1'
Introduce a new API to visit objects in batches based on a common
path, or by type.

* ds/path-walk-1:
  path-walk: drop redundant parse_tree() call
  path-walk: reorder object visits
  path-walk: mark trees and blobs as UNINTERESTING
  path-walk: visit tags and cached objects
  path-walk: allow consumer to specify object types
  t6601: add helper for testing path-walk API
  test-lib-functions: add test_cmp_sorted
  path-walk: introduce an object walk by path
2025-01-29 14:05:09 -08:00
f0a371a39d Merge branch 'jc/show-usage-help'
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others.  The built-in commands
have been fixed to show them on the standard output consistently.

* jc/show-usage-help:
  builtin: send usage() help text to standard output
  oddballs: send usage() help text to standard output
  builtins: send usage_with_options() help text to standard output
  usage: add show_usage_if_asked()
  parse-options: add show_usage_with_options_if_asked()
  t0012: optionally check that "-h" output goes to stdout
2025-01-28 13:02:22 -08:00
7f9870794f test-tool: add helper for name-hash values
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.

Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.

Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.

My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:

Test                               this tree
--------------------------------------------------
5314.1: paths at head                         4.5K
5314.2: distinct hash value: v1               4.1K
5314.3: maximum multiplicity: v1                13
5314.4: distinct hash value: v2               4.2K
5314.5: maximum multiplicity: v2                 9

Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.

In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:

Test                               this tree
--------------------------------------------------
5314.1: paths at head                        19.5K
5314.2: distinct hash value: v1               8.2K
5314.3: maximum multiplicity: v1               279
5314.4: distinct hash value: v2              17.8K
5314.5: maximum multiplicity: v2                44

[1] https://github.com/microsoft/fluentui

That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.

With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.

In a more extreme example, an internal monorepo had a much worse
collision rate:

Test                               this tree
--------------------------------------------------
5314.1: paths at head                       227.3K
5314.2: distinct hash value: v1              72.3K
5314.3: maximum multiplicity: v1             14.4K
5314.4: distinct hash value: v2             166.5K
5314.5: maximum multiplicity: v2               138

Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 13:21:43 -08:00
3339180b28 t/helper/test-hash.c: use unsafe_hash_algo()
Remove a series of conditionals within the shared cmd_hash_impl() helper
that powers the 'sha1' and 'sha1-unsafe' helpers.

Instead, replace them with a single conditional that transforms the
specified hash algorithm into its unsafe variant. Then all subsequent
calls can directly use whatever function it wants to call without having
to decide between the safe and unsafe variants.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
d9213e4716 t/helper/test-tool: implement sha1-unsafe helper
With the new "unsafe" SHA-1 build knob, it is convenient to have a
test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.

Implement that helper by altering the implementation of that test-tool
(in cmd_hash_impl(), which is generic and parameterized over different
hash functions) to conditionally run the unsafe variants of the chosen
hash function, and expose the new behavior via a new 'sha1-unsafe' test
helper.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:16 -08:00
7b39a128c8 Merge branch 'ps/the-repository'
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.

* ps/the-repository:
  match-trees: stop using `the_repository`
  graph: stop using `the_repository`
  add-interactive: stop using `the_repository`
  tmp-objdir: stop using `the_repository`
  resolve-undo: stop using `the_repository`
  credential: stop using `the_repository`
  mailinfo: stop using `the_repository`
  diagnose: stop using `the_repository`
  server-info: stop using `the_repository`
  send-pack: stop using `the_repository`
  serve: stop using `the_repository`
  trace: stop using `the_repository`
  pager: stop using `the_repository`
  progress: stop using `the_repository`
2025-01-21 08:44:54 -08:00
cb441e1ec3 Merge branch 'ps/reftable-get-random-fix'
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.

* ps/reftable-get-random-fix:
  reftable/stack: accept insecure random bytes
  wrapper: allow generating insecure random bytes
2025-01-21 08:44:53 -08:00
b821c999ca builtins: send usage_with_options() help text to standard output
Using the show_usage_with_options_if_asked() helper we introduced
earlier, fix callers of usage_with_options() that want to show the
help text when explicitly asked by the end-user.  The help text now
goes to the standard output stream for them.

The test in t7600 for "git merge -h" may want to be retired, as the
same is covered by t0012 already, but it is specifically testing that
the "-h" option gets a response even with a corrupt index file, so
for now let's leave it there.

Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-17 13:30:03 -08:00
1568d1562e wrapper: allow generating insecure random bytes
The `csprng_bytes()` function generates randomness and writes it into a
caller-provided buffer. It abstracts over a couple of implementations,
where the exact one that is used depends on the platform.

These implementations have different guarantees: while some guarantee to
never fail (arc4random(3)), others may fail. There are two significant
failures to distinguish from one another:

  - Systemic failure, where e.g. opening "/dev/urandom" fails or when
    OpenSSL doesn't have a provider configured.

  - Entropy failure, where the entropy pool is exhausted, and thus the
    function cannot guarantee strong cryptographic randomness.

While we cannot do anything about the former, the latter failure can be
acceptable in some situations where we don't care whether or not the
randomness can be predicted.

Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt
into weak cryptographic randomness. The exact behaviour of the flag
depends on the underlying implementation:

    - `arc4random_buf()` never returns an error, so it doesn't change.

    - `getrandom()` pulls from "/dev/urandom" by default, which never
      blocks on modern systems even when the entropy pool is empty.

    - `getentropy()` seems to block when there is not enough randomness
      available, and there is no way of changing that behaviour.

    - `GtlGenRandom()` doesn't mention anything about its specific
      failure mode.

    - The fallback reads from "/dev/urandom", which also returns bytes in
      case the entropy pool is drained in modern Linux systems.

That only leaves OpenSSL with `RAND_bytes()`, which returns an error in
case the returned data wouldn't be cryptographically safe. This function
is replaced with a call to `RAND_pseudo_bytes()`, which can indicate
whether or not the returned data is cryptographically secure via its
return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag
is set, then we ignore the insecurity and return the data regardless.

It is somewhat questionable whether we really need the flag in the first
place, or whether we wouldn't just ignore the potentially-insecure data.
But the risk of doing that is that we might have or grow callsites that
aren't aware of the potential insecureness of the data in places where
it really matters. So using a flag to opt-in to that behaviour feels
like the more secure choice.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07 09:04:18 -08:00
5e7fe8a7b8 commit-reach: use size_t to track indices when computing merge bases
The functions `repo_get_merge_bases_many()` and friends accepts an array
of commits as well as a parameter that indicates how large that array
is. This parameter is using a signed integer, which leads to a couple of
warnings with -Wsign-compare.

Refactor the code to use `size_t` to track indices instead and adapt
callers accordingly. While most callers are trivial, there are two
callers that require a bit more scrutiny:

  - builtin/merge-base.c:show_merge_base() subtracts `1` from the
    `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if
    the variable was `0` it would wrap. This code is fine though because
    its only caller will execute that code only when `argc >= 2`, and it
    follows that `rev_nr >= 2`, as well.

  - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`.
    Again, there is only a single caller that populates `rev_nr` with
    `good_revs.nr`. And because a bisection always requires at least one
    good revision it follws that `rev_nr >= 1`.

Mark the file as -Wsign-compare-clean.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:12:40 -08:00
4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
6333e7ae0b path-walk: mark trees and blobs as UNINTERESTING
When the input rev_info has UNINTERESTING starting points, we want to be
sure that the UNINTERESTING flag is passed appropriately through the
objects. To match how this is done in places such as 'git pack-objects', we
use the mark_edges_uninteresting() method.

This method has an option for using the "sparse" walk, which is similar in
spirit to the path-walk API's walk. To be sure to keep it independent, add a
new 'prune_all_uninteresting' option to the path_walk_info struct.

To check how the UNINTERSTING flag is spread through our objects, extend the
'test-tool path-walk' command to output whether or not an object has that
flag. This changes our tests significantly, including the removal of some
objects that were previously visited due to the incomplete implementation.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:05 -08:00
9145660979 path-walk: visit tags and cached objects
The rev_info that is specified for a path-walk traversal may specify
visiting tag refs (both lightweight and annotated) and also may specify
indexed objects (blobs and trees). Update the path-walk API to walk
these objects as well.

When walking tags, we need to peel the annotated objects until reaching
a non-tag object. If we reach a commit, then we can add it to the
pending objects to make sure we visit in the commit walk portion. If we
reach a tree, then we will assume that it is a root tree. If we reach a
blob, then we have no good path name and so add it to a new list of
"tagged blobs".

When the rev_info includes the "--indexed-objects" flag, then the
pending set includes blobs and trees found in the cache entries and
cache-tree. The cache entries are usually blobs, though they could be
trees in the case of a sparse index. The cache-tree stores
previously-hashed tree objects but these are cleared out when staging
objects below those paths. We add tests that demonstrate this.

The indexed objects come with a non-NULL 'path' value in the pending
item. This allows us to prepopulate the 'path_to_lists' strmap with
lists for these paths.

The tricky thing about this walk is that we will want to combine the
indexed objects walk with the commit walk, especially in the future case
of walking objects during a command like 'git repack'.

Whenever possible, we want the objects from the index to be grouped with
similar objects in history. We don't want to miss any paths that appear
only in the index and not in the commit history.

Thus, we need to be careful to let the path stack be populated initially
with only the root tree path (and possibly tags and tagged blobs) and go
through the normal depth-first search. Afterwards, if there are other
paths that are remaining in the paths_to_lists strmap, we should then
iterate through the stack and visit those objects recursively.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:05 -08:00
c8dba310d7 path-walk: allow consumer to specify object types
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.

This adds the ability to ask for the commits in a list, as well. We
re-use the empty string for this set of objects because these are passed
directly to the callback function instead of being part of the
'path_stack'.

Future changes will add the ability to visit annotated tags.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:05 -08:00
d190124f27 t6601: add helper for testing path-walk API
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.

Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.

It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:04 -08:00
395b584b57 serve: stop using the_repository
Stop using `the_repository` in the "serve" subsystem by passing in a
repository when advertising capabilities or serving requests.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
1f7e6478dc progress: stop using the_repository
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
913a1e157c Merge branch 'ps/build-sign-compare' into ps/the-repository
* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18 10:43:16 -08:00
29e5596eb8 Merge branch 'ps/build'
Build procedure update plus introduction of Meson based builds.

* ps/build: (24 commits)
  Introduce support for the Meson build system
  Documentation: add comparison of build systems
  t: allow overriding build dir
  t: better support for out-of-tree builds
  Documentation: extract script to generate a list of mergetools
  Documentation: teach "cmd-list.perl" about out-of-tree builds
  Documentation: allow sourcing generated includes from separate dir
  Makefile: simplify building of templates
  Makefile: write absolute program path into bin-wrappers
  Makefile: allow "bin-wrappers/" directory to exist
  Makefile: refactor generators to be PWD-independent
  Makefile: extract script to generate gitweb.js
  Makefile: extract script to generate gitweb.cgi
  Makefile: extract script to massage Python scripts
  Makefile: extract script to massage Shell scripts
  Makefile: use "generate-perl.sh" to massage Perl library
  Makefile: extract script to massage Perl scripts
  Makefile: consistently use PERL_PATH
  Makefile: generate doc versions via GIT-VERSION-GEN
  Makefile: generate "git.rc" via GIT-VERSION-GEN
  ...
2024-12-15 17:54:33 -08:00
ca43bd2562 Merge branch 'kn/midx-wo-the-repository'
Yet another "pass the repository through the callchain" topic.

* kn/midx-wo-the-repository:
  midx: inline the `MIDX_MIN_SIZE` definition
  midx: pass down `hash_algo` to functions using global variables
  midx: pass `repository` to `load_multi_pack_index`
  midx: cleanup internal usage of `the_repository` and `the_hash_algo`
  midx-write: pass down repository to `write_midx_file[_only]`
  write-midx: add repository field to `write_midx_context`
  midx-write: use `revs->repo` inside `read_refs_snapshot`
  midx-write: pass down repository to static functions
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
2024-12-13 07:33:44 -08:00
de9278127e Merge branch 'ps/reftable-detach'
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.

* ps/reftable-detach:
  reftable/system: provide thin wrapper for lockfile subsystem
  reftable/stack: drop only use of `get_locked_file_path()`
  reftable/system: provide thin wrapper for tempfile subsystem
  reftable/stack: stop using `fsync_component()` directly
  reftable/system: stop depending on "hash.h"
  reftable: explicitly handle hash format IDs
  reftable/system: move "dir.h" to its only user
2024-12-10 10:04:56 +09:00
904339edbd Introduce support for the Meson build system
Introduce support for the Meson build system, a "modern" meta build
system that supports many different platforms, including Linux, macOS,
Windows and BSDs. Meson supports different backends, including Ninja,
Xcode and Microsoft Visual Studio. Several common IDEs provide an
integration with it.

The biggest contender compared to Meson is probably CMake as outlined in
our "Documentation/technical/build-systems.txt" file. Based on my own
personal experience from working with both build systems extensively I
strongly favor Meson over CMake. In my opinion, it feels significantly
easier to use with a syntax that feels more like a "real" programming
language. The second big reason is that Meson supports Rust natively,
which may prove to be important given that the project may pick up Rust
as another language eventually.

Using Meson is rather straight-forward. An example:

    ```
    # Meson uses out-of-tree builds. You can set up multiple build
    # directories, how you name them is completely up to you.
    $ mkdir build
    $ cd build
    $ meson setup .. -Dprefix=/tmp/git-installation

    # Build the project. This also provides several other targets like
    e.g. `install` or `test`.
    $ ninja

    # Meson has been wired up to support execution of our test suites.
    # Both our unit tests and our integration tests are supported.
    # Running `meson test` without any arguments will execute all tests,
    # but the syntax supports globbing to select only some tests.
    $ meson test 't-*'
    # Execute single test interactively to allow for debugging.
    $ meson test 't0000-*' --interactive --test-args=-ix
    ```

The build instructions have been successfully tested on the following
systems, tests are passing:

  - Apple macOS 10.15.

  - FreeBSD 14.1.

  - NixOS 24.11.

  - OpenBSD 7.6.

  - Ubuntu 24.04.

  - Windows 10 with Cygwin.

  - Windows 10 with MinGW64, except for t9700, which is also broken with
    our Makefile.

  - Windows 10 with Visual Studio 2022 toolchain, using the Native Tools
    Command Prompt with `meson setup --vsenv`. Tests pass, except for
    t9700.

  - Windows 10 with Visual Studio 2022 solution, using the Native Tools
    Command Prompt with `meson setup --backend vs2022`. Tests pass,
    except for t9700.

  - Windows 10 with VS Code, using the Meson plug-in.

It is expected that there will still be rough edges in the current
version. If this patch lands the expectation is that it will coexist
with our other build systems for a while. Like this, distributions can
slowly migrate over to Meson and report any findings they have to us
such that we can continue to iterate. A potential cutoff date for other
build systems may be Git 3.0.

Some notes:

  - The installed distribution is structured somewhat differently than
    how it used to be the case. All of our binaries are installed into
    `$libexec/git-core`, while all binaries part of `$bindir` are now
    symbolic links pointing to the former. This rule is consistent in
    itself and thus easier to reason about.

  - We do not install dashed binaries into `$libexec/git-core` anymore,
    so there won't e.g. be a symlink for git-add(1). These are not
    required by modern Git and there isn't really much of a use case for
    those anymore. By not installing those symlinks we thus start the
    deprecation of this layout.

  - We're targeting Meson 1.3.0, which has been released relatively
    recently November 2023. The only feature we use from that version is
    `fs.relative_to()`, which we could replace if necessary. If so, we
    could start to target Meson 1.0.0 and newer, released in December
    2022.

  - The whole build instructions count around 3300 lines, half of which
    is listing all of our code and test files. Our Makefiles are around
    5000 lines, autoconf adds another 1300 lines. CMake in comparison
    has only 1200 linescode, but it avoids listing individual files and
    does not wire up auto-configuration as extensively as the Meson
    instructions do.

  - We bundle a set of subproject wrappers for curl, expat, openssl,
    pcre2 and zlib. This allows developers to build Git without these
    dependencies preinstalled, and Meson will fetch and build them
    automatically. This is especially helpful on Windows.

Helped-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:14 +09:00
e03d2a9ccb t/helper: don't depend on implicit wraparound
In our test helpers we have two cases where we assign -1 to an `unsigned
long`. The intent is to essentially mean "unbounded output", which is
achieved via implicit wraparound of the value.

This pattern causes warnings with -Wsign-compare though. Adapt it and
instead use `ULONG_MAX` explicitly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:05 +09:00
80c9e70ebe global: trivial conversions to fix -Wsign-compare warnings
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.

This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:04 +09:00
47d72a74a7 diff.h: fix index used to loop through unsigned integer
The `struct diff_flags` structure is essentially an array of flags, all
of which have the same type. We can thus use `sizeof()` to iterate
through all of the flags, which we do in `diff_flags_or()`. But while
the statement returns an unsigned integer, we used a signed integer to
iterate through the flags, which generates a warning.

Fix this by using `size_t` for the index instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:03 +09:00
41f43b8243 global: mark code units that generate warnings with -Wsign-compare
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
d5c2ca576a midx: pass repository to load_multi_pack_index
The `load_multi_pack_index` function in midx uses `the_repository`
variable to access the `repository` struct. Modify the function and its
callee's to send the `repository` field.

This moves usage of `the_repository` to the `test-read-midx.c` file.
While that is not optimal, it is okay, since the upcoming commits will
slowly move the usage of `the_repository` up the layers and remove it
eventually.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
1e18cf4310 Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands'
Built-in Git subcommands are supplied the repository object to work
with; they learned to do the same when they invoke sub-subcommands.

* kn/pass-repo-to-builtin-sub-sub-commands:
  builtin: pass repository to sub commands
2024-12-04 10:14:47 +09:00
7ee055b237 Merge branch 'ps/ref-backend-migration-optim'
The migration procedure between two ref backends has been optimized.

* ps/ref-backend-migration-optim:
  reftable: rename scratch buffer
  refs: adapt `initial_transaction` flag to be unsigned
  reftable/block: optimize allocations by using scratch buffer
  reftable/block: rename `block_writer::buf` variable
  reftable/writer: optimize allocations by using a scratch buffer
  refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
  refs: skip collision checks in initial transactions
  refs: use "initial" transaction semantics to migrate refs
  refs/files: support symbolic and root refs in initial transaction
  refs: introduce "initial" transaction flag
  refs/files: move logic to commit initial transaction
  refs: allow passing flags when setting up a transaction
2024-12-04 10:14:41 +09:00
6f33d8e255 builtin: pass repository to sub commands
In 9b1cb5070f (builtin: add a repository parameter for builtin
functions, 2024-09-13) the repository was passed down to all builtin
commands. This allowed the repository to be passed down to lower layers
without depending on the global `the_repository` variable.

Continue this work by also passing down the repository parameter from
the command to sub-commands. This will help pass down the repository to
other subsystems and cleanup usage of global variables like
'the_repository' and 'the_hash_algo'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 10:36:08 +09:00
818e165898 t/helper: fix leaking commit graph in "read-graph" subcommand
We're leaking the commit-graph in the "test-helper read-graph"
subcommand, but as the leak is annotated with `UNLEAK()` the leak
sanitizer doesn't complain.

Fix the leak by calling `free_commit_graph()`. Besides getting rid of
the `UNLEAK()` annotation, it also increases code coverage because we
properly release resources as Git would do it, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:45 +09:00
e4929cdf79 refs: skip collision checks in initial transactions
Reference transactions use `refs_verify_refname_available()` to check
for colliding references. This check consists of two parts:

  - Checks for whether multiple ref updates in the same transaction
    conflict with each other.

  - Checks for whether existing refs conflict with any refs part of the
    transaction.

While we generally cannot avoid the first check, the second check is
superfluous in cases where the transaction is an initial one in an
otherwise empty ref store. The check results in multiple ref reads as
well as the creation of a ref iterator for every ref we're checking,
which adds up quite fast when performing the check for many refs.

Introduce a new flag that allows us to skip this check and wire it up in
such that the backends pass it when running an initial transaction. This
leads to significant speedups when migrating ref storage backends. From
"files" to "reftable":

    Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     472.4 ms ±   6.7 ms    [User: 175.9 ms, System: 285.2 ms]
      Range (min … max):   463.5 ms … 483.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      86.1 ms ±   1.9 ms    [User: 67.9 ms, System: 16.0 ms]
      Range (min … max):    82.9 ms …  90.9 ms    29 runs

    Summary
      migrate files:reftable (refcount = 100000, revision = HEAD) ran
        5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~)

And from "reftable" to "files":

    Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
      Time (mean ± σ):     452.7 ms ±   3.4 ms    [User: 209.9 ms, System: 235.4 ms]
      Range (min … max):   445.9 ms … 457.5 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      95.2 ms ±   2.2 ms    [User: 73.6 ms, System: 20.6 ms]
      Range (min … max):    91.7 ms … 100.8 ms    28 runs

    Summary
      migrate reftable:files (refcount = 100000, revision = HEAD) ran
        4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:16 +09:00
c2f08236ed reftable/system: stop depending on "hash.h"
We include "hash.h" in "reftable/system.h" such that we can use hash
format IDs as well as the raw size of SHA1 and SHA256. As we are in the
process of converting the reftable library to become standalone we of
course cannot rely on those constants anymore.

Introduce a new `enum reftable_hash` to replace internal uses of the
hash format IDs and new constants that replace internal uses of the hash
size. Adapt the reftable backend to set up the correct hash function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
6890c99e38 Merge branch 'ps/leakfixes-part-9'
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-13 08:35:31 +09:00
2664f2a0cb Merge branch 'ps/leakfixes-part-9' into ps/leakfixes-part-10
* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-07 13:25:01 +09:00
0bc0fcf0b2 t/helper: fix leaking buffer in "dump-untracked-cache"
We never release the local `struct strbuf base` buffer, thus leaking
memory. Fix this leak.

This leak is exposed by t7063, but plugging it alone does not make the
whole test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:56 -08:00
a53144cf1b t/helper: stop re-initialization of the_repository
While "common-main.c" already initializes `the_repository` for us, we do
so a second time in the "read-cache" test helper. This causes a memory
leak because the old repository's contents isn't released.

Stop calling `initialize_repository()` to plug this leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:56 -08:00
ee3e8c3afa t/helper: fix leaks in "reach" test tool
The "reach" test tool doesn't bother to clean up any of its allocated
resources, causing various leaks. Plug them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:51 -08:00
479ab76c9f packfile: use object_id in find_pack_entry_one()
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
4d334e5205 Merge branch 'ak/typofixes'
Typofixes.

* ak/typofixes:
  t: fix typos
  t/helper: fix a typo
  t/perf: fix typos
  t/unit-tests: fix typos
  contrib: fix typos
  compat: fix typos
2024-10-25 14:02:04 -04:00
31bc4454de Merge branch 'ps/leakfixes-part-8'
More leakfixes.

* ps/leakfixes-part-8: (23 commits)
  builtin/send-pack: fix leaking list of push options
  remote: fix leaking push reports
  t/helper: fix leaks in proc-receive helper
  pack-write: fix return parameter of `write_rev_file_order()`
  revision: fix leaking saved parents
  revision: fix memory leaks when rewriting parents
  midx-write: fix leaking buffer
  pack-bitmap-write: fix leaking OID array
  pseudo-merge: fix leaking strmap keys
  pseudo-merge: fix various memory leaks
  line-log: fix several memory leaks
  diff: improve lifecycle management of diff queues
  builtin/revert: fix leaking `gpg_sign` and `strategy` config
  t/helper: fix leaking repository in partial-clone helper
  builtin/clone: fix leaking repo state when cloning with bundle URIs
  builtin/pack-redundant: fix various memory leaks
  builtin/stash: fix leaking `pathspec_from_file`
  submodule: fix leaking submodule entry list
  wt-status: fix leaking buffer with sparse directories
  shell: fix leaking strings
  ...
2024-10-10 14:22:29 -07:00
5575c713c2 Merge branch 'ps/reftable-alloc-failures'
The reftable library is now prepared to expect that the memory
allocation function given to it may fail to allocate and to deal
with such an error.

* ps/reftable-alloc-failures: (26 commits)
  reftable/basics: fix segfault when growing `names` array fails
  reftable/basics: ban standard allocator functions
  reftable: introduce `REFTABLE_FREE_AND_NULL()`
  reftable: fix calls to free(3P)
  reftable: handle trivial allocation failures
  reftable/tree: handle allocation failures
  reftable/pq: handle allocation failures when adding entries
  reftable/block: handle allocation failures
  reftable/blocksource: handle allocation failures
  reftable/iter: handle allocation failures when creating indexed table iter
  reftable/stack: handle allocation failures in auto compaction
  reftable/stack: handle allocation failures in `stack_compact_range()`
  reftable/stack: handle allocation failures in `reftable_new_stack()`
  reftable/stack: handle allocation failures on reload
  reftable/reader: handle allocation failures in `reader_init_iter()`
  reftable/reader: handle allocation failures for unindexed reader
  reftable/merged: handle allocation failures in `merged_table_init_iter()`
  reftable/writer: handle allocation failures in `reftable_new_writer()`
  reftable/writer: handle allocation failures in `writer_index_hash()`
  reftable/record: handle allocation failures when decoding records
  ...
2024-10-10 14:22:25 -07:00
897124aa1b t/helper: fix a typo
Fix a typo in comments: bellow -> below.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10 13:31:13 -07:00
802c0646ac reftable/merged: handle allocation failures in merged_table_init_iter()
Handle allocation failures in `merged_table_init_iter()`. While at it,
merge `merged_iter_init()` into the function. It only has a single
caller and merging them makes it easier to handle allocation failures
consistently.

This change also requires us to adapt `reftable_stack_init_*_iterator()`
to bubble up the new error codes of `merged_table_iter_init()`. Adapt
callsites accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02 07:53:53 -07:00
365529e1ea Merge branch 'ps/leakfixes-part-7'
More leak-fixes.

* ps/leakfixes-part-7: (23 commits)
  diffcore-break: fix leaking filespecs when merging broken pairs
  revision: fix leaking parents when simplifying commits
  builtin/maintenance: fix leak in `get_schedule_cmd()`
  builtin/maintenance: fix leaking config string
  promisor-remote: fix leaking partial clone filter
  grep: fix leaking grep pattern
  submodule: fix leaking submodule ODB paths
  trace2: destroy context stored in thread-local storage
  builtin/difftool: plug several trivial memory leaks
  builtin/repack: fix leaking configuration
  diffcore-order: fix leaking buffer when parsing orderfiles
  parse-options: free previous value of `OPTION_FILENAME`
  diff: fix leaking orderfile option
  builtin/pull: fix leaking "ff" option
  dir: fix off by one errors for ignored and untracked entries
  builtin/submodule--helper: fix leaking remote ref on errors
  t/helper: fix leaking subrepo in nested submodule config helper
  builtin/submodule--helper: fix leaking error buffer
  builtin/submodule--helper: clear child process when not running it
  submodule: fix leaking update strategy
  ...
2024-10-02 07:46:26 -07:00
12f0fb9538 t/helper: fix leaks in proc-receive helper
Fix trivial leaks in the proc-receive helpe.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:08 -07:00