Commit Graph

75205 Commits

Author SHA1 Message Date
Derrick Stolee
c8dba310d7 path-walk: allow consumer to specify object types
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.

This adds the ability to ask for the commits in a list, as well. We
re-use the empty string for this set of objects because these are passed
directly to the callback function instead of being part of the
'path_stack'.

Future changes will add the ability to visit annotated tags.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:05 -08:00
Derrick Stolee
d190124f27 t6601: add helper for testing path-walk API
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.

Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.

It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:04 -08:00
Derrick Stolee
cef003d453 test-lib-functions: add test_cmp_sorted
This test helper will be helpful to reduce repeated logic in
t6601-path-walk.sh, but may be helpful elsewhere, too.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:04 -08:00
Derrick Stolee
9d46bc791b path-walk: introduce an object walk by path
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.

These batches are collected in 'struct type_and_oid_list' objects, which
store an object type and an oid_array of objects.

The data structures are documented in 'struct path_walk_context', but in
summary the most important are:

  * 'paths_to_lists' is a strmap that connects a path to a
    type_and_oid_list for that path. To avoid conflicts in path names,
    we make sure that tree paths end in "/" (except the root path with
    is an empty string) and blob paths do not end in "/".

  * 'path_stack' is a string list that is added to in an append-only
    way. This stores the stack of our depth-first search on the heap
    instead of using recursion.

  * 'path_stack_pushed' is a strmap that stores path names that were
    already added to 'path_stack', to avoid repeating paths in the
    stack. Mostly, this saves us from quadratic lookups from doing
    unsorted checks into the string_list.

The coupling of 'path_stack' and 'path_stack_pushed' is protected by the
push_to_stack() method. Call this instead of inserting into these
structures directly.

The walk_objects_by_path() method initializes these structures and
starts walking commits from the given rev_info struct. The commits are
used to find the list of root trees which populate the start of our
depth-first search.

The core of our depth-first search is in a while loop that continues
while we have not indicated an early exit and our 'path_stack' still has
entries in it. The loop body pops a path off of the stack and "visits"
the path via the walk_path() method.

The walk_path() method gets the list of OIDs from the 'path_to_lists'
strmap and executes the callback method on that list with the given path
and type. If the OIDs correspond to tree objects, then iterate over all
trees in the list and run add_children() to add the child objects to
their own lists, adding new entries to the stack if necessary.

In testing, this depth-first search approach was the one that used the
least memory while iterating over the object lists. There is still a
chance that repositories with too-wide path patterns could cause memory
pressure issues. Limiting the stack size could be done in the future by
limiting how many objects are being considered in-progress, or by
visiting blob paths earlier than trees.

There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 08:37:04 -08:00
Taylor Blau
23d289d273 The sixth batch 2024-10-30 13:36:44 -04:00
Taylor Blau
6aa984af3b Merge branch 'sk/t7011-cleanup'
Test cleanup.

* sk/t7011-cleanup:
  t7011: ensure no whitespace after redirect
2024-10-30 13:08:07 -04:00
Taylor Blau
8d305fdaac Merge branch 'co/t6050-pipefix'
Avoid losing exit status by having Git command being tested on the
upstream side of a pipe.

* co/t6050-pipefix:
  t6050: avoid pipes with upstream Git commands
2024-10-30 13:08:06 -04:00
Taylor Blau
cec4461e3f Merge branch 'ks/t4205-fixup'
Testfix.

* ks/t4205-fixup:
  t4205: fix typo in 'NUL termination with --stat'
2024-10-30 13:08:05 -04:00
Taylor Blau
9947803926 Merge branch 'kh/submitting-patches'
Docfix.

* kh/submitting-patches:
  SubmittingPatches: tags -> trailers
2024-10-30 13:08:04 -04:00
Taylor Blau
6f763d798b Merge branch 'ps/ref-filter-sort'
Teaches the ref-filter machinery to recognize and avoid cases where
sorting would be redundant.

* ps/ref-filter-sort:
  ref-filter: format iteratively with lexicographic refname sorting
2024-10-30 13:08:02 -04:00
Taylor Blau
bc627658b0 Merge branch 'ps/reftable-strbuf'
Implements a new reftable-specific strbuf replacement to reduce
reftable's dependency on Git-specific data structures.

* ps/reftable-strbuf:
  reftable: handle trivial `reftable_buf` errors
  reftable/stack: adapt `stack_filename()` to handle allocation failures
  reftable/record: adapt `reftable_record_key()` to handle allocation failures
  reftable/stack: adapt `format_name()` to handle allocation failures
  t/unit-tests: check for `reftable_buf` allocation errors
  reftable/blocksource: adapt interface name
  reftable: convert from `strbuf` to `reftable_buf`
  reftable/basics: provide new `reftable_buf` interface
  reftable: stop using `strbuf_addf()`
  reftable: stop using `strbuf_addbuf()`
2024-10-30 13:08:01 -04:00
Taylor Blau
6a11438f43 The fifth batch 2024-10-25 14:11:13 -04:00
Taylor Blau
55d12c24d7 Merge branch 'wm/shortlog-hash'
Teaches 'shortlog' to explicitly use SHA-1 when operating outside of
a repository.

* wm/shortlog-hash:
  builtin/shortlog: explicitly set hash algo when there is no repo
2024-10-25 14:02:49 -04:00
Taylor Blau
fcaac14abf Merge branch 'sk/msvc-warnings'
Fixes compile time warnings with 64-bit MSVC.

* sk/msvc-warnings:
  mingw.c: Fix complier warnings for a 64 bit msvc
2024-10-25 14:02:44 -04:00
Taylor Blau
0ab43ed95c Merge branch 'jc/a-commands-without-the-repo'
Commands that can also work outside Git have learned to take the
repository instance "repo" when we know we are in a repository, and
NULL when we are not, in a parameter.  The uses of the_repository
variable in a few of them have been removed using the new calling
convention.

* jc/a-commands-without-the-repo:
  archive: remove the_repository global variable
  annotate: remove usage of the_repository global
  git: pass in repo to builtin based on setup_git_directory_gently
2024-10-25 14:02:36 -04:00
Taylor Blau
dca32b8288 Merge branch 'pb/clar-build-fix'
Build fix.

* pb/clar-build-fix:
  Makefile: fix dependency for $(UNIT_TEST_DIR)/clar/clar.o
2024-10-25 14:02:25 -04:00
Taylor Blau
448022a7fb Merge branch 'bf/t-readme-mention-reftable'
Doc update.

* bf/t-readme-mention-reftable:
  t/README: add missing value for GIT_TEST_DEFAULT_REF_FORMAT
2024-10-25 14:02:21 -04:00
Taylor Blau
f25bb60393 Merge branch 'ak/typofix'
More typofixes.

* ak/typofix:
  t: fix typos
2024-10-25 14:02:08 -04:00
Taylor Blau
4d334e5205 Merge branch 'ak/typofixes'
Typofixes.

* ak/typofixes:
  t: fix typos
  t/helper: fix a typo
  t/perf: fix typos
  t/unit-tests: fix typos
  contrib: fix typos
  compat: fix typos
2024-10-25 14:02:04 -04:00
Taylor Blau
55bc7d54ab Merge branch 'ps/ci-gitlab-windows'
Enable Windows-based CI in GitLab.

* ps/ci-gitlab-windows:
  gitlab-ci: exercise Git on Windows
  gitlab-ci: introduce stages and dependencies
  ci: handle Windows-based CI jobs in GitLab CI
  ci: create script to set up Git for Windows SDK
  t7300: work around platform-specific behaviour with long paths on MinGW
2024-10-25 14:01:21 -04:00
Taylor Blau
6cbcc68ea7 Merge branch 'db/submodule-fetch-with-remote-name-fix'
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-10-25 14:01:09 -04:00
Seyi Kuforiji
91687cd13f t7011: ensure no whitespace after redirect
This change updates the script to conform to the coding standards
outlined in the Git project's documentation. According to the guidelines
in Documentation/CodingGuidelines under "Redirection operators", there
should be no whitespace after redirection operators.

Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-22 15:37:59 -04:00
Taylor Blau
fd3785337b The third batch
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-22 14:43:46 -04:00
Taylor Blau
8e08668322 Merge branch 'cw/worktree-relative'
An extra worktree attached to a repository points at each other to
allow finding the repository from the worktree and vice versa
possible.  Turn this linkage to relative paths.

* cw/worktree-relative:
  worktree: add test for path handling in linked worktrees
  worktree: link worktrees with relative paths
  worktree: refactor infer_backlink() to use *strbuf
  worktree: repair copied repository and linked worktrees
2024-10-22 14:40:39 -04:00
Taylor Blau
6ca9a05e63 Merge branch 'ps/cache-tree-w-broken-index-entry'
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.

* ps/cache-tree-w-broken-index-entry:
  unpack-trees: detect mismatching number of cache-tree/index entries
  cache-tree: detect mismatching number of index entries
  cache-tree: refactor verification to return error codes
2024-10-22 14:40:38 -04:00
Chizoba ODINAKA
9e362dd060 t6050: avoid pipes with upstream Git commands
In pipes, the exit code of a chain of commands is determined by
the final command. In order not to miss the exit code of a failed
Git command, avoid pipes instead write output of Git commands
into a file.
For better debugging experience, instances of "grep" were changed
to "test_grep". "test_grep" provides more context in case of a
failed "grep".

Signed-off-by: Chizoba ODINAKA <chizobajames21@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-22 12:47:27 -04:00
Kousik Sanagavarapu
a73070fbd4 t4205: fix typo in 'NUL termination with --stat'
Correct "expected" to rightly terminate with NUL ie '\0' instead of '0'
which may have been typoed.

We didn't notice this before because the test is run with
"test_expect_failure", meaning the test would have been marked broken
anyways.

Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 17:37:11 -04:00
Kristoffer Haugsbakk
52acf6771b SubmittingPatches: tags -> trailers
“Trailer” is the preferred nomenclature in this project.  Also add a
definite article where I think it makes sense.

As we can see the rest of the document already prefers this term.  This
just gets rid of the last stragglers.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 17:23:00 -04:00
Patrick Steinhardt
2e7c6d2f41 ref-filter: format iteratively with lexicographic refname sorting
In bd98f9774e (ref-filter.c: filter & format refs in the same callback,
2023-11-14), we have introduced logic into the ref-filter subsystem that
determines whether or not we can output references iteratively instead
of first collecting all references, post-processing them and printing
them once done. This has the advantage that we don't have to store all
refs in memory and, when used with e.g. `--count=1`, that we don't have
to read all refs in the first place.

One restriction we have in place for that is that caller must not ask
for sorted refs, because there is no way to sort the refs without first
reading them all into an array. So the benefits can only be reaped when
explicitly asking for output not to be sorted.

But there is one exception here where we _can_ get away with sorting
refs while streaming: ref backends sort references returned by their
iterators in lexicographic order. So if the following conditions are all
true we can do iterative streaming:

  - There must be at most a single sorting specification, as otherwise
    we're not using plain lexicographic ordering.

  - The sorting specification must use the "refname".

  - The sorting specification must not be using any flags, like
    case-insensitive sorting.

Now the resulting logic does feel quite fragile overall, which makes me
a bit uneasy. But after thinking about this for a while I couldn't find
any obvious gaps in my reasoning. Furthermore, given that lexicographic
sorting order is the default in git-for-each-ref(1), this is likely to
benefit a whole lot of usecases out there.

The following benchmark executes git-for-each-ref(1) in a crafted repo
with 1 million references:

  Benchmark 1: git for-each-ref (revision = HEAD~)
    Time (mean ± σ):      6.756 s ±  0.014 s    [User: 3.004 s, System: 3.541 s]
    Range (min … max):    6.738 s …  6.784 s    10 runs

  Benchmark 2: git for-each-ref (revision = HEAD)
    Time (mean ± σ):      6.479 s ±  0.017 s    [User: 2.858 s, System: 3.422 s]
    Range (min … max):    6.450 s …  6.519 s    10 runs

  Summary
    git for-each-ref (revision = HEAD)
      1.04 ± 0.00 times faster than git for-each-ref (revision = HEAD~)

The change results in a slight performance improvement, but nothing that
would really stand out. Something that cannot be seen in the benchmark
though is peak memory usage, which went from 404.5MB to 68.96kB.

A more interesting benchmark is printing a single referenence with
`--count=1`:

  Benchmark 1: git for-each-ref --count=1 (revision = HEAD~)
    Time (mean ± σ):      6.655 s ±  0.018 s    [User: 2.865 s, System: 3.576 s]
    Range (min … max):    6.630 s …  6.680 s    10 runs

  Benchmark 2: git for-each-ref --count=1 (revision = HEAD)
    Time (mean ± σ):       8.6 ms ±   1.3 ms    [User: 2.3 ms, System: 6.1 ms]
    Range (min … max):     6.7 ms …  14.4 ms    266 runs

  Summary
    git git for-each-ref --count=1 (revision = HEAD)
    770.58 ± 116.19 times faster than git for-each-ref --count=1 (revision = HEAD~)

Whereas we scaled with the number of references before, we now print the
first reference and exit immediately, which provides a massive win.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 16:46:03 -04:00
Taylor Blau
34b6ce9b30 The third batch
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-18 14:01:50 -04:00
Taylor Blau
c1662a00b6 Merge branch 'ps/maintenance-start-crash-fix'
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-10-18 13:56:26 -04:00
Taylor Blau
2849552beb Merge branch 'xx/protocol-v2-doc-markup-fix'
Docfix.

* xx/protocol-v2-doc-markup-fix:
  Documentation/gitprotocol-v2.txt: fix a slight inconsistency in format
2024-10-18 13:56:25 -04:00
Taylor Blau
728ae63c05 Merge branch 'tc/bundle-uri-leakfix'
Leakfix.

* tc/bundle-uri-leakfix:
  bundle-uri: plug leak in unbundle_from_file()
2024-10-18 13:56:24 -04:00
Taylor Blau
645cc7a2a7 Merge branch 'kh/checkout-ignore-other-docfix'
Doc updates.

* kh/checkout-ignore-other-docfix:
  checkout: refer to other-worktree branch, not ref
2024-10-18 13:56:24 -04:00
Taylor Blau
4491734107 Merge branch 'kh/merge-tree-doc'
Docfix.

* kh/merge-tree-doc:
  doc: merge-tree: improve example script
2024-10-18 13:56:23 -04:00
Taylor Blau
6fe1b8cee0 Merge branch 'ng/rebase-merges-branch-name-as-label'
"git rebase --rebase-merges" now uses branch names as labels when
able.

* ng/rebase-merges-branch-name-as-label:
  rebase-merges: try and use branch names as labels
  rebase-update-refs: extract load_branch_decorations
  load_branch_decorations: fix memory leak with non-static filters
2024-10-18 13:56:22 -04:00
Taylor Blau
b967851417 Merge branch 'kn/loose-object-layer-wo-global-hash'
Code clean-up.

* kn/loose-object-layer-wo-global-hash:
  loose: don't rely on repository global state
2024-10-18 13:56:22 -04:00
Taylor Blau
ee064ba65a Merge branch 'jc/doc-refspec-syntax'
Doc updates.

* jc/doc-refspec-syntax:
  doc: clarify <src> in refspec syntax
2024-10-18 13:56:20 -04:00
Taylor Blau
020c16bdb9 Merge branch 'aa/t7300-modernize'
Test modernization.

* aa/t7300-modernize:
  t7300-clean.sh: use test_path_* helper functions for error logging
2024-10-18 13:54:43 -04:00
Patrick Steinhardt
20590cd287 reftable: handle trivial reftable_buf errors
Convert the reftable library such that we handle failures with the
new `reftable_buf` interfaces.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
591c6a600e reftable/stack: adapt stack_filename() to handle allocation failures
The `stack_filename()` function cannot pass any errors to the caller as it
has a `void` return type. Adapt it and its callers such that we can
handle errors and start handling allocation failures.

There are two interesting edge cases in `reftable_stack_destroy()` and
`reftable_addition_close()`. Both of these are trying to tear down their
respective structures, and while doing so they try to unlink some of the
tables they have been keeping alive. Any earlier attempts to do that may
fail on Windows because it keeps us from deleting such tables while they
are still open, and thus we re-try on close. It's okay and even expected
that this can fail when the tables are still open by another process, so
we handle the allocation failures gracefully and just skip over any file
whose name we couldn't figure out.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
4abc8022ff reftable/record: adapt reftable_record_key() to handle allocation failures
The `reftable_record_key()` function cannot pass any errors to the
caller as it has a `void` return type. Adapt it and its callers such
that we can handle errors and start handling allocation failures.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
e693ccf2c9 reftable/stack: adapt format_name() to handle allocation failures
The `format_name()` function cannot pass any errors to the caller as it
has a `void` return type. Adapt it and its callers such that we can
handle errors and start handling allocation failures.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
31eedd1d11 t/unit-tests: check for reftable_buf allocation errors
Adapt our unit tests to check for allocations errors.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
f177d49163 reftable/blocksource: adapt interface name
Adapt the name of the `strbuf` block source to no longer relate to this
interface, but instead to the `reftable_buf` interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
be4c070a3c reftable: convert from strbuf to reftable_buf
Convert the reftable library to use the `reftable_buf` interface instead
of the `strbuf` interface. This is mostly a mechanical change via sed(1)
with some manual fixes where functions for `strbuf` and `reftable_buf`
differ. The converted code does not yet handle allocation failures. This
will be handled in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:56 -04:00
Patrick Steinhardt
81eddda540 reftable/basics: provide new reftable_buf interface
Implement a new `reftable_buf` interface that will replace Git's own
`strbuf` interface. This is done due to three reasons:

  - The `strbuf` interfaces do not handle memory allocation failures and
    instead causes us to die. This is okay in the context of Git, but is
    not in the context of the reftable library, which is supposed to be
    usable by third-party applications.

  - The `strbuf` interface is quite deeply tied into Git, which makes it
    hard to use the reftable library as a standalone library. Any
    dependent would have to carefully extract the relevant parts of it
    to make things work, which is not all that sensible.

  - The `strbuf` interface does not use the pluggable allocators that
    can be set up via `reftable_set_alloc()`.

So we have good reasons to use our own type, and the implementation is
rather trivial. Implement our own type. Conversion of the reftable
library will be handled in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:55 -04:00
Patrick Steinhardt
7fa7e14ebe reftable: stop using strbuf_addf()
We're about to introduce our own `reftable_buf` type to replace
`strbuf`. One function we'll have to convert is `strbuf_addf()`, which
is used in a handful of places. This function uses `snprintf()`
internally, which makes porting it a bit more involved:

  - It is not available on all platforms.

  - Some platforms like Windows have broken implementations.

So by using `snprintf()` we'd also push the burden on downstream users
of the reftable library to make available a properly working version of
it.

Most callsites of `strbuf_addf()` are trivial to convert to not using
it. We do end up using `snprintf()` in our unit tests, but that isn't
much of a problem for downstream users of the reftable library.

While at it, remove a useless call to `strbuf_reset()` in
`t_reftable_stack_auto_compaction_with_locked_tables()`. We don't write
to the buffer before this and initialize it with `STRBUF_INIT`, so there
is no need to reset anything.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:55 -04:00
Patrick Steinhardt
409f04995e reftable: stop using strbuf_addbuf()
We're about to introduce our own `reftable_buf` type to replace
`strbuf`. Get rid of the seldomly-used `strbuf_addbuf()` function such
that we have to reimplement one less function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:59:55 -04:00
Andrew Kreimer
f1eea0b620 t: fix typos
Fix typos in documentation, comments, etc.

Via codespell.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:14:56 -04:00