When a bitmap is used to answer some reachability query, it creates a
pseudo-bitmap called the "extended index" on top of any existing bitmaps
to store objects that are relevant to the query, but not mentioned in
the bitmap.
When looking up the ith object in the extended index in a bitmap, it is
common to write something like:
bitmap_get(result, i + bitmap_num_objects(bitmap_git))
, indicating that we want the ith object following all other objects
mentioned in the bitmap_git.
Since the type of `i` and the return type of `bitmap_num_objects()` are
both `uint32_t`s, But if there are either a large number of objects in
the bitmap, or a large number of objects in the extended index (or
both), this addition can overflow when the sum is greater than 2^32-1.
Having that large of a bitmap position is entirely acceptable, but we
need to ensure that the computed bitmap position for that object is
performed using 64-bits and doesn't overflow.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a filesystem-level corruption occurs in a .bitmap file, Git can react
poorly. This could take the form of a run-time error due to failing to
parse an EWAH bitmap or be more subtle such as returning the wrong set
of objects to a fetch or clone.
A natural first response to either of these kinds of errors is to run
'git fsck' to see if any files are corrupt. This currently ignores all
.bitmap files.
Add checks to 'git fsck' for all .bitmap files that are currently
associated with a multi-pack-index or pack file. Verify their checksums
using the hashfile API.
We iterate through all multi-pack-indexes and pack-files to be sure to
check all .bitmap files, not just the one that would be read by the
process. For example, a multi-pack-index bitmap overrules a pack-bitmap.
However, if the multi-pack-index is removed, the pack-bitmap may be
selected instead. Be thorough to include every file that could become
active in such a way. This includes checking files in alternates.
There is potential that we could extend this effort to check the
structure of the reachability bitmaps themselves, but it is very
expensive to do so. At minimum, it's as expensive as generating the
bitmaps in the first place, and that's assuming that we don't use the
trivial algorithm of verifying each bitmap individually. The trivial
algorithm will result in quadratic behavior (number of objects times
number of bitmapped commits) while the bitmap building operation
constructs a lattice of commits to build bitmaps incrementally and then
generate the final bitmaps from a subset of those commits.
If we were to extend 'git fsck' to check .bitmap file contents more
closely like this, then we would likely want to hide it behind an option
that signals the user is more willing to do expensive operations such as
this.
For testing, set up a repository with a pack-bitmap _and_ a
multi-pack-index bitmap. This requires some file movement to avoid
deleting the pack-bitmap during the repack that creates the
multi-pack-index bitmap. We can then verify that 'git fsck' is checking
all files, not just the "active" bitmap.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
While parsing a .rev file, we check the header information to be sure it
makes sense. This happens before doing any additional validation such as
a checksum or value check. In order to differentiate between a bad
header and a non-existent file, we need to update the API for loading a
reverse index.
Make load_pack_revindex_from_disk() non-static and specify that a
positive value means "the file does not exist" while other errors during
parsing are negative values. Since an invalid header prevents setting up
the structures we would use for further validations, we can stop at that
point.
The place where we can distinguish between a missing file and a corrupt
file is inside load_revindex_from_disk(), which is used both by pack
rev-indexes and multi-pack-index rev-indexes. Some tests in t5326
demonstrate that it is critical to take some conditions to allow
positive error signals.
Add tests that check the three header values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will introduce a `pack.readReverseIndex`
configuration, which forces Git to generate the reverse index from
scratch instead of loading it from disk.
In order to avoid reading this configuration value more than once, we'll
use the `repo_settings` struct to lazily load this value.
In order to access the `struct repo_settings`, add a repository argument
to `load_pack_revindex`, and update all callers to pass the correct
instance (in all cases, `the_repository`).
In certain instances, a new function-local variable is introduced to
take the place of a `struct repository *` argument to the function
itself to avoid propagating the new parameter even further throughout
the tree.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
Assorted config API updates.
* ab/config-multi-and-nonbool:
for-each-repo: with bad config, don't conflate <path> and <cmd>
config API: add "string" version of *_value_multi(), fix segfaults
config API users: test for *_get_value_multi() segfaults
for-each-repo: error on bad --config
config API: have *_multi() return an "int" and take a "dest"
versioncmp.c: refactor config reading next commit
config API: add and use a "git_config_get()" family of functions
config tests: add "NULL" tests for *_get_value_multi()
config tests: cover blind spots in git_die_config() tests
Fix numerous and mostly long-standing segfaults in consumers of
the *_config_*value_multi() API. As discussed in the preceding commit
an empty key in the config syntax yields a "NULL" string, which these
users would give to strcmp() (or similar), resulting in segfaults.
As this change shows, most users users of the *_config_*value_multi()
API didn't really want such an an unsafe and low-level API, let's give
them something with the safety of git_config_get_string() instead.
This fix is similar to what the *_string() functions and others
acquired in[1] and [2]. Namely introducing and using a safer
"*_get_string_multi()" variant of the low-level "_*value_multi()"
function.
This fixes segfaults in code introduced in:
- d811c8e17c (versionsort: support reorder prerelease suffixes, 2015-02-26)
- c026557a37 (versioncmp: generalize version sort suffix reordering, 2016-12-08)
- a086f921a7 (submodule: decouple url and submodule interest, 2017-03-17)
- a6be5e6764 (log: add log.excludeDecoration config option, 2020-04-16)
- 92156291ca (log: add default decoration filter, 2022-08-05)
- 50a044f1e4 (gc: replace config subprocesses with API calls, 2022-09-27)
There are now two users ofthe low-level API:
- One in "builtin/for-each-repo.c", which we'll convert in a
subsequent commit.
- The "t/helper/test-config.c" code added in [3].
As seen in the preceding commit we need to give the
"t/helper/test-config.c" caller these "NULL" entries.
We could also alter the underlying git_configset_get_value_multi()
function to be "string safe", but doing so would leave no room for
other variants of "*_get_value_multi()" that coerce to other types.
Such coercion can't be built on the string version, since as we've
established "NULL" is a true value in the boolean context, but if we
coerced it to "" for use in a list of strings it'll be subsequently
coerced to "false" as a boolean.
The callback pattern being used here will make it easy to introduce
e.g. a "multi" variant which coerces its values to "bool", "int",
"path" etc.
1. 40ea4ed903 (Add config_error_nonbool() helper function,
2008-02-11)
2. 6c47d0e8f3 (config.c: guard config parser from value=NULL,
2008-02-11).
3. 4c715ebb96 (test-config: add tests for the config_set API,
2014-07-28)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Have the "git_configset_get_value_multi()" function and its siblings
return an "int" and populate a "**dest" parameter like every other
git_configset_get_*()" in the API.
As we'll take advantage of in subsequent commits, this fixes a blind
spot in the API where it wasn't possible to tell whether a list was
empty from whether a config key existed. For now we don't make use of
those new return values, but faithfully convert existing API users.
Most of this is straightforward, commentary on cases that stand out:
- To ensure that we'll properly use the return values of this function
in the future we're using the "RESULT_MUST_BE_USED" macro introduced
in [1].
As git_die_config() now has to handle this return value let's have
it BUG() if it can't find the config entry. As tested for in a
preceding commit we can rely on getting the config list in
git_die_config().
- The loops after getting the "list" value in "builtin/gc.c" could
also make use of "unsorted_string_list_has_string()" instead of using
that loop, but let's leave that for now.
- In "versioncmp.c" we now use the return value of the functions,
instead of checking if the lists are still non-NULL.
1. 1e8697b5c4 (submodule--helper: check repo{_submodule,}_init()
return values, 2022-09-01),
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the change in the last commit to move several functions to
write-or-die.h, csum-file.h no longer needs to include cache.h.
However, removing that include forces several other C files, which
directly or indirectly dependend upon csum-file.h's inclusion of
cache.h, to now be more explicit about their dependencies.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
Our graph-traversal functions take callbacks for showing commits and
objects, but not all callbacks need each parameter. Likewise for the
similar traverse_bitmap_commit_list(), which has a different interface
but serves the same purpose. And the include_check mechanism, which
passes along a void pointer which is not always used.
Mark the unused ones to to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we find a midx bitmap, we do not bother checking for pack
bitmaps, since we can use only one. But since we will warn of unused
bitmaps via trace2, let's continue looking for pack bitmaps when
tracing is enabled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After opening a bitmap successfully, we try opening others only
because we want to report that other bitmap files are ignored in
the trace2 log. When trace2 is not enabled, we do not have to
do any of that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "open_midx_bitmap_1()" and "open_pack_bitmap_1()", when we find that
there are multiple bitmaps, we will only open the first one and then
leave warnings about the remaining pack information, the information
will contain the absolute path of the repository, for example in a
alternates usage scenario. So let's hide this kind of potentially
sensitive information in this commit.
Found-by: XingXin <moweng.xx@antgroup.com>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When trying to open a pack bitmap, we call open_pack_bitmap_1() in a
loop, during which it tries to open up the pack index corresponding
with each available pack.
It's likely that we'll end up relying on objects in that pack later
in the process (in which case we're doing the work of opening the
pack index optimistically), but not guaranteed.
For instance, consider a repository with a large number of small
packs, and one large pack with a bitmap. If we see that bitmap pack
last in our loop which calls open_pack_bitmap_1(), the current code
will have opened *all* pack index files in the repository. If the
request can be served out of the bitmapped pack alone, then the time
spent opening these idx files was wasted.S
Since open_pack_bitmap_1() calls is_pack_valid() later on (which in
turns calls open_pack_index() itself), we can just drop the earlier
call altogether.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The trace2 region around the call to lazy_bitmap_for_commit() in
bitmap_for_commit() was added in 28cd730680 (pack-bitmap: prepare to
read lookup table extension, 2022-08-14). While adding trace2 regions is
typically helpful for tracking performance, this method is called
possibly thousands of times as a commit walk explores commit history
looking for a matching bitmap. When trace2 output is enabled, this
region is emitted many times and performance is throttled by that
output.
For now, remove these regions entirely.
This is a critical path, and it would be valuable to measure that the
time spent in bitmap_for_commit() does not increase when using the
commit lookup table. The best way to do that would be to use a mechanism
that sums the time spent in a region and reports a single value at the
end of the process. This technique was introduced but not merged by [1]
so maybe this example presents some justification to revisit that
approach.
[1] https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/
To help with the 'git blame' output in this region, add a comment that
warns against adding a trace2 region. Delete a test from t5310 that used
that trace output to check that this lookup optimization was activated.
To create this kind of test again in the future, the stopwatch traces
mentioned earlier could be used as a signal that we activated this code
path.
Helpedy-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.
Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tweak various messages that come from the pack-bitmap codepaths.
* tl/pack-bitmap-error-messages:
pack-bitmap.c: continue looping when first MIDX bitmap is found
pack-bitmap.c: using error() instead of silently returning -1
pack-bitmap.c: do not ignore error when opening a bitmap file
pack-bitmap.c: rename "idx_name" to "bitmap_name"
pack-bitmap.c: mark more strings for translations
pack-bitmap.c: fix formatting of error messages
In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when
the first one has been found, then will break out by a "return"
directly.
But actually, it's better to continue the loop until we have visited
both the MIDX in our repository, as well as any alternates (along with
_their_ alternates, recursively).
The reason for this is, there may exist more than one MIDX file in
a repo. The "multi_pack_index" struct is actually designed as a singly
linked list, and if a MIDX file has been already opened successfully,
then the other MIDX files will be skipped and left with a warning
"ignoring extra bitmap file." to the output.
The discussion link of community:
https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to
return error() instead of "-1" when some unexpected error occurs like
"stat bitmap file failed", "bitmap header is invalid" or "checksum
mismatch", etc.
There are places where we do not replace, such as when the bitmap
does not exist (no bitmap in repository is allowed) or when another
bitmap has already been opened (in which case it should be a warning
rather than an error).
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calls to git_open() to open the pack bitmap file and
multi-pack bitmap file do not report any error when they
fail. These files are optional and it is not an error if
open failed due to ENOENT, but we shouldn't be ignoring
other kinds of errors.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use
a var named "idx_name" to represent the bitmap filename which
is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()"
before we open it.
There may bring some confusion in this "idx_name" naming, which
might lead us to think of ".idx "or" multi-pack-index" files,
although bitmap is essentially can be understood as a kind of index,
let's define this name a little more accurate here.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In pack-bitmap.c, some printed texts are translated, some are not.
Let's support the translations of the bitmap related output.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some text output issues in 'pack-bitmap.c', they exist in
die(), error() etc. This includes issues with capitalization the
first letter, newlines, error() instead of BUG(), and substitution
that don't have quotes around them.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.
* tb/midx-race-in-pack-objects:
builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
builtin/pack-objects.c: ensure included `--stdin-packs` exist
builtin/pack-objects.c: avoid redundant NULL check
pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
When pack-objects adds an entry to its packing list, it marks the
packfile and offset containing the object, which we may later use during
verbatim reuse (c.f., `write_reused_pack_verbatim()`).
If the packfile in question is deleted in the background (e.g., due to a
concurrent `git repack`), we'll die() as a result of calling use_pack(),
unless we have an open file descriptor on the pack itself. 4c08018204
(pack-objects: protect against disappearing packs, 2011-10-14) worked
around this by opening the pack ahead of time before recording it as a
valid source for reuse.
4c08018204's treatment meant that we could tolerate disappearing packs,
since it ensures we always have an open file descriptor on any pack that
we mark as a valid source for reuse. This tightens the race to only
happen when we need to close an open pack's file descriptor (c.f., the
caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted,
in which case we'll complain that a pack could not be accessed and
die().
The pack bitmap code does this, too, since prior to dc1daacdcc
(pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it
was vulnerable to the same race.
The MIDX bitmap code does not do this, and is vulnerable to the same
race. Apply the same treatment as dc1daacdcc to the routine responsible
for opening the multi-pack bitmap's preferred pack to close this race.
This patch handles the "preferred" pack (c.f., the section
"multi-pack-index reverse indexes" in
Documentation/technical/pack-format.txt) specially, since pack-objects
depends on reusing exact chunks of that pack verbatim in
reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded,
the utility of a bitmap is significantly diminished.
Similar to dc1daacdcc, we could technically just add this check in
reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX
.bitmap without needing to open any of its packs. But it's simpler to do
the check as early as possible, covering all direct uses of the
preferred pack. Note that doing this check early requires us to call
prepare_midx_pack() early, too, so move the relevant part of that loop
from load_reverse_index() into open_midx_bitmap_1().
Subsequent patches handle the non-preferred packs in a slightly
different fashion.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.
* ep/maint-equals-null-cocci:
tree-wide: apply equals-null.cocci
tree-wide: apply equals-null.cocci
contrib/coccinnelle: add equals-null.cocci
Now that all consumers of traverse_commit_list_filtered() populate the
'filter' member of 'struct rev_info', we can drop that parameter from
the method prototype to simplify things. In addition, the only thing
different now between traverse_commit_list_filtered() and
traverse_commit_list() is the presence of the 'omitted' parameter, which
is only non-NULL for one caller. We can consolidate these two methods by
having one call the other and use the simpler form everywhere the
'omitted' parameter would be NULL.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all consumers of prepare_bitmap_walk() have populated the
'filter' member of 'struct rev_info', we can drop that extra parameter
from the method and access it directly from the 'struct rev_info'.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or
open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs.
By design, these functions are supposed to be called over every pack and
MIDX, since only one of them should have a valid bitmap.
Ordinarily we return '0' from these two functions in order to indicate
that we successfully loaded a bitmap To signal that we couldn't load a
bitmap corresponding to the MIDX/pack (either because one doesn't exist,
or because there was an error with loading it), we can return '-1'. In
either case, the callers each enumerate all MIDXs/packs to ensure that
at most one bitmap per-kind is present.
But when we fail to load a bitmap that does exist (for example, loading
a MIDX bitmap without finding a corresponding reverse index), we'll
return -1 but leave the 'midx' field non-NULL. So when we fallback to
loading a pack bitmap, we'll complain that the bitmap we're trying to
populate already is "opened", even though it isn't.
Rectify this by setting the '->pack' and '->midx' field back to NULL as
appropriate. Two tests are added: one to ensure that the MIDX-to-pack
bitmap fallback works, and another to ensure we still complain when
there are multiple pack bitmaps in a repository.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If prepare_bitmap_git() returns NULL (one easy-to-trigger cause being
that the repository does not have bitmaps at all), then we'll segfault
accessing bitmap_git->hashes:
$ t/helper/test-tool bitmap dump-hashes
Segmentation fault
We should treat this the same as a repository with bitmaps but no
name-hashes, and quietly produce an empty output. The later call to
free_bitmap_index() in the cleanup label is OK, as it treats a NULL
pointer as a noop.
This isn't a big deal in practice, as this function is intended for and
used only by test-tool. It's probably worth fixing to avoid confusion,
but not worth adding coverage for this to the test suite.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function free_bitmap_index() is somewhat lax in what it frees. There
are two notable examples:
- While it does call kh_destroy_oid_map on the "bitmaps" map, which
maps commit OIDs to their corresponding bitmaps, the bitmaps
themselves are not freed. Note here that we recycle already-freed
ewah_bitmaps into a pool, but these are handled correctly by
ewah_pool_free().
- We never bother to free the extended index's "positions" map, which
we always allocate in load_bitmap().
Fix both of these.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_bitmap_walk() is used to implement `git rev-list --test-bitmap`,
which compares the result of the on-disk bitmaps with ones generated
on-the-fly during a revision walk.
In fa95666a40 (pack-bitmap.c: harden 'test_bitmap_walk()' to check type
bitmaps, 2021-08-24), we hardened those tests to also check the four
special type-level bitmaps, but never freed those bitmaps. We should
have, since each required an allocation when we EWAH-decompressed them.
Free those, plugging that leak, and also free the base (the scratch-pad
bitmap), too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To ask for the name of a MIDX and its corresponding .rev file, callers
invoke get_midx_filename() and get_midx_rev_filename(), respectively.
These both invoke xstrfmt(), allocating a chunk of memory which must be
freed later on.
This makes callers in pack-bitmap.c somewhat awkward. Specifically,
midx_bitmap_filename(), which is implemented like:
return xstrfmt("%s-%s.bitmap",
get_midx_filename(midx->object_dir),
hash_to_hex(get_midx_checksum(midx)));
this leaks the second argument to xstrfmt(), which itself was allocated
with xstrfmt(). This caller could assign both the result of
get_midx_filename() and the outer xstrfmt() to a temporary variable,
remembering to free() the former before returning. But that involves a
wasteful copy.
Instead, get_midx_filename() and get_midx_rev_filename() take a strbuf
as an output parameter. This way midx_bitmap_filename() can manipulate
and pass around a temporary buffer which it detaches back to its caller.
That allows us to implement the function without copying or open-coding
get_midx_filename() in a way that doesn't leak.
Update the other callers of get_midx_filename() and
get_midx_rev_filename() accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack" has been taught to generate multi-pack reachability
bitmaps.
* tb/repack-write-midx:
test-read-midx: fix leak of bitmap_index struct
builtin/repack.c: pass `--refs-snapshot` when writing bitmaps
builtin/repack.c: make largest pack preferred
builtin/repack.c: support writing a MIDX while repacking
builtin/repack.c: extract showing progress to a variable
builtin/repack.c: rename variables that deal with non-kept packs
builtin/repack.c: keep track of existing packs unconditionally
midx: preliminary support for `--refs-snapshot`
builtin/multi-pack-index.c: support `--stdin-packs` mode
midx: expose `write_midx_file_only()` publicly
When repacking into a geometric series and writing a multi-pack bitmap,
it is beneficial to have the largest resulting pack be the preferred
object source in the bitmap's MIDX, since selecting the large packs can
lead to fewer broken delta chains and better compression.
Teach 'git repack' to identify this pack and pass it to the MIDX write
machinery in order to mark it as preferred.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>