mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Patrick Steinhardt	f234df07f6	reftable/stack: handle locked tables during auto-compaction When compacting tables, it may happen that we want to compact a set of tables which are already locked by a concurrent process that compacts them. In the case where we wanted to perform a full compaction of all tables it is sensible to bail out in this case, as we cannot fulfill the requested action. But when performing auto-compaction it isn't necessarily in our best interest of us to abort the whole operation. For example, due to the geometric compacting schema that we use, it may be that process A takes a lot of time to compact the bulk of all tables whereas process B appends a bunch of new tables to the stack. B would in this case also notice that it has to compact the tables that process A is compacting already and thus also try to compact the same range, probably including the new tables it has appended. But because those tables are locked already, it will fail and thus abort the complete auto-compaction. The consequence is that the stack will grow longer and longer while A isn't yet done with compaction, which will lead to a growing performance impact. Instead of aborting auto-compaction altogether, let's gracefully handle this situation by instead compacting tables which aren't locked. To do so, instead of locking from the beginning of the slice-to-be-compacted, we start locking tables from the end of the slice. Once we hit the first table that is locked already, we abort. If we succeeded to lock two or more tables, then we simply reduce the slice of tables that we're about to compact to those which we managed to lock. This ensures that we can at least make some progress for compaction in said scenario. It also helps in other scenarios, like for example when a process died and left a stale lockfile behind. In such a case we can at least ensure some compaction on a best-effort basis. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:43 -07:00
Patrick Steinhardt	ed1ad6b44d	reftable/stack: fix corruption on concurrent compaction The locking employed by compaction uses the following schema: 1. Lock "tables.list" and verify that it matches the version we have loaded in core. 2. Lock each of the tables in the user-supplied range of tables that we are supposed to compact. These locks prohibit any concurrent process to compact those tables while we are doing that. 3. Unlock "tables.list". This enables concurrent processes to add new tables to the stack, but also allows them to compact tables outside of the range of tables that we have locked. 4. Perform the compaction. 5. Lock "tables.list" again. 6. Move the compacted table into place. 7. Write the new order of tables, including the compacted table, into the lockfile. 8. Commit the lockfile into place. Letting concurrent processes modify the "tables.list" file while we are doing the compaction is very much part of the design and thus expected. After all, it may take some time to compact tables in the case where we are compacting a lot of very large tables. But there is a bug in the code. Suppose we have two processes which are compacting two slices of the table. Given that we lock each of the tables before compacting them, we know that the slices must be disjunct from each other. But regardless of that, compaction performed by one process will always impact what the other process needs to write to the "tables.list" file. Right now, we do not check whether the "tables.list" has been changed after we have locked it for the second time in (5). This has the consequence that we will always commit the old, cached in-core tables to disk without paying to respect what the other process has written. This scenario would then lead to data loss and corruption. This can even happen in the simpler case of one compacting process and one writing process. The newly-appended table by the writing process would get discarded by the compacting process because it never sees the new table. Fix this bug by re-checking whether our stack is still up to date after locking for the second time. If it isn't, then we adjust the indices of tables to replace in the updated stack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:43 -07:00
Patrick Steinhardt	128b9aa3e9	reftable/stack: use lock_file when adding table to "tables.list" When modifying "tables.list", we need to lock the list before updating it to ensure that no concurrent writers modify the list at the same point in time. While we do this via the `lock_file` subsystem when compacting the stack, we manually handle the lock when adding a new table to it. While not wrong, it is at least inconsistent. Refactor the code to consistently lock "tables.list" via the `lock_file` subsytem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:43 -07:00
Patrick Steinhardt	7ee307da1b	reftable/stack: do not die when fsyncing lock file files We use `fsync_component_or_die()` when committing an addition to the "tables.list" lock file, which unsurprisingly dies in case the fsync fails. Given that this is part of the reftable library, we should never die and instead let callers handle the error. Adapt accordingly and use `fsync_component()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:43 -07:00
Patrick Steinhardt	558f6fbeb1	reftable/stack: simplify tracking of table locks When compacting tables, we store the locks of all tables we are about to compact in the `table_locks` array. As we currently only ever compact all tables in the user-provided range or none, we simply track those locks via the indices of the respective tables in the merged stack. This is about to change though, as we will introduce a mode where auto compaction gracefully handles the case of already-locked files. In this case, it may happen that we only compact a subset of the user-supplied range of tables. In this case, the indices will not necessarily match the lock indices anymore. Refactor the code such that we track the number of locks via a separate variable. The resulting code is expected to perform the same, but will make it easier to perform the described change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:42 -07:00
Patrick Steinhardt	5f0ed603a1	reftable/stack: update stats on failed full compaction When auto-compaction fails due to a locking error, we update the statistics to indicate this failure. We're not doing the same when performing a full compaction. Fix this inconsistency by using `stack_compact_range_stats()`, which handles the stat update for us. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:42 -07:00
Patrick Steinhardt	8030100bda	reftable/stack: test compaction with already-locked tables We're lacking test coverage for compacting tables when some of the tables that we are about to compact are locked. Add two tests that exercise this, one for auto-compaction and one for full compaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:42 -07:00
Patrick Steinhardt	9a833ca35d	reftable/stack: extract function to setup stack with N tables We're about to add two tests, and both of them will want to initialize the reftable stack with a set of N tables. Introduce a new function that handles this and refactor existing tests that use such a setup to use it. Note that this changes the exact records contained in the preexisting tests. This is fine though as we only care about the shape of the stack here, not the shape of each table. Furthermore, with this change we now start to disable auto compaction when writing the tables, as otherwise we might not end up with the expected amount of new tables added. This also slightly changes the behaviour of these tests, but the properties we care for remain intact. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:42 -07:00
Patrick Steinhardt	ed7d2f4770	reftable/stack: refactor function to gather table sizes Refactor the function that gathers table sizes to be more idiomatic. For one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`. Second, avoid using an integer to iterate through the tables in the reftable stack given that `stack_len` itself is using a `size_t`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 10:14:41 -07:00
shejialuo	1c31be45b3	fsck: add ref name check for files backend The git-fsck(1) only implicitly checks the reference, it does not fully check refs with bad format name such as standalone "@". However, a file ending with ".lock" should not be marked as having a bad ref name. It is expected that concurrent writers may have such lock files. We currently ignore this situation. But for bare ".lock" file, we will report it as error. In order to provide such checks, add a new fsck message id "badRefName" with default ERROR type. Use existing "check_refname_format" to explicit check the ref name. And add a new unit test to verify the functionality. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:53 -07:00
shejialuo	a7600b8481	files-backend: add unified interface for refs scanning For refs and reflogs, we need to scan its corresponding directories to check every regular file or symbolic link which shares the same pattern. Introduce a unified interface for scanning directories for files-backend. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:53 -07:00
shejialuo	bf061d26c7	builtin/refs: add verify subcommand Introduce a new subcommand "verify" in git-refs(1) to allow the user to check the reference database consistency and also this subcommand will be used as the entry point of checking refs for "git-fsck(1)". Add "verbose" field into "fsck_options" to indicate whether we should print verbose messages when checking refs and objects consistency. Remove bit-field for "strict" field, this is because we cannot take address of a bit-field which makes it unhandy to set member variables when parsing the command line options. The "git-fsck(1)" declares "fsck_options" variable with "static" identifier which avoids complaint by the leak-checker. However, in "git-refs verify", we need to do memory clean manually. Thus add "fsck_options_clear" function in "fsck.c" to provide memory clean operation. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:53 -07:00
shejialuo	ab6f79d8df	refs: set up ref consistency check infrastructure The "struct ref_store" is the base class which contains the "be" pointer which provides backend-specific functions whose interfaces are defined in the "ref_storage_be". We could reuse this polymorphism to define only one interface. For every backend, we need to provide its own function pointer. The interfaces defined in the `ref_storage_be` are carefully structured in semantic. It's organized as the five parts: 1. The name and the initialization interfaces. 2. The ref transaction interfaces. 3. The ref internal interfaces (pack, rename and copy). 4. The ref filesystem interfaces. 5. The reflog related interfaces. To keep consistent with the git-fsck(1), add a new interface named "fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be grouped into any above five categories. Explicitly add blank line to make it different from others. Last, implement placeholder functions for each ref backends. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:53 -07:00
shejialuo	2de307cdb2	fsck: add refs report function Introduce a new struct "fsck_ref_report" to contain the information we need when reporting refs-related messages. With the new "fsck_vreport" function, add a new function "fsck_report_ref" to report refs-related fsck error message. Unlike "report" function uses the exact parameters, we simply pass "struct fsck_ref_report *report" as the parameter. This is because at current we don't know exactly how many fields we need. By passing this parameter, we don't need to change this function prototype when we want to add more information into "fsck_ref_report". We have introduced "fsck_report_ref" function to report the error message for refs. We still need to add the corresponding callback function. Create refs-specific "error_func" callback "fsck_refs_error_function". Last, add "FSCK_REFS_OPTIONS_DEFAULT" macro to create default options when checking ref consistency. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:52 -07:00
shejialuo	3473d18fad	fsck: add a unified interface for reporting fsck messages The static function "report" provided by "fsck.c" aims at checking error type and calling the callback "error_func" to report the message. Both refs and objects need to check the error type of the current fsck message. In order to extract this common behavior, create a new function "fsck_vreport". Instead of using "...", provide "va_list" to allow more flexibility. Instead of changing "report" prototype to be align with the "fsck_vreport" function, we leave the "report" prototype unchanged due to the reason that there are nearly 62 references about "report" function. Simply change "report" function to use "fsck_vreport" to report objects related messages. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:52 -07:00
shejialuo	0ec5dfe8c4	fsck: make "fsck_error" callback generic The "fsck_error" callback is designed to report the objects-related error messages. It accepts two parameter "oid" and "object_type" which is not generic. In order to provide a unified callback which can report either objects or refs, remove the objects-related parameters and add the generic parameter "void *fsck_report". Create a new "fsck_object_report" structure which incorporates the removed parameters "oid" and "object_type". Then change the corresponding references to adapt to new "fsck_error" callback. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:52 -07:00
shejialuo	8cd4a447b8	fsck: rename objects-related fsck error functions The names of objects-related fsck error functions are generic. It's OK when there is only object database check. However, we are going to introduce refs database check report function. To avoid ambiguity, rename object-related fsck error functions to explicitly indicate these functions are used to report objects-related messages. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:52 -07:00
shejialuo	2d79aa9095	fsck: rename "skiplist" to "skip_oids" The "skiplist" field in "fsck_options" is related to objects. Because we are going to introduce ref consistency check, the "skiplist" name is too general which will make the caller think "skiplist" is related to both the refs and objects. It may seem that for both refs and objects, we should provide a general "skiplist" here. However, the type for "skiplist" is `struct oidset` which is totally unsuitable for refs. To avoid above ambiguity, rename "skiplist" to "skip_oids". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:36:52 -07:00
Patrick Steinhardt	6f1e9394e2	object: fix leaking packfiles when closing object store When calling `raw_object_store_clear()`, we close and free several resources associated with the object store. Part of that is to close and free all the packfiles, which is handled by `close_object_store()`. That function really only ends up closing the packfiles though, but it doesn't free them. And in fact it can't, as that function is being called via `run_command()` when `close_object_store = 1`, which is done e.g. when we execute git-maintenance(1). At that point, other structures may still have references on those packfiles, and thus we cannot free them here. So while it is in fact intentional that we really only close them, the result is a memory leak because `raw_object_store_clear()` does not free them, either. Fix the leak by freeing the packfiles in `raw_object_store_clear()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:22:21 -07:00
Patrick Steinhardt	fa0f27a19d	submodule: fix leaking seen submodule names We keep track of submodules we have already seen via a string map such that we don't process the same submodule twice. We never free that map though, causing a memory leak. Fix this leak by clearing the map. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:22:21 -07:00
Patrick Steinhardt	1a7e5efdb0	submodule: fix leaking fetch tasks When done with a fetch task used for parallel fetches of submodules, we need to both call `fetch_task_release()` to release the task's contents and `free()` to release the task itself. Most sites do this already, but some only call `fetch_task_release()` and thus leak memory. While we could trivially fix this by adding the two missing calls to free(3P), the result would be that we always call both functions. Let's thus refactor the code such that `fetch_task_release()` also frees the structure itself. Rename it to `fetch_task_free()` accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:22:21 -07:00
Patrick Steinhardt	c369fc46d0	builtin/submodule: allow "add" to use different ref storage format Same as with "clone", users may want to add a submodule to a repository with a non-default ref storage format. Wire up a new `--ref-format=` option that works the same as for `git submodule clone`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:22:21 -07:00
Patrick Steinhardt	fb99dded31	refs: fix ref storage format for submodule ref stores When opening a submodule ref storage we accidentally use the ref storage format of the owning repository, not of the submodule repository. As submodules may have a different storage format than their parent repo this can lead to bugs when trying to access the submodule ref storage from the parent repository. One such bug was reported when performing a recursive pull with mixed ref stores, which fails with: $ git pull --recursive fatal: Unable to find current revision in submodule path 'path/to/sub' The same issue occurs when adding a repository contained in the working tree with a different ref storage format via `git submodule add`. Fix the bug by using the submodule repository's ref storage format instead and add some tests. Note that the test for `git submodule status` was included as a precaution, only. The command worked alright even without the bugfix. Reported-by: Jeppe Øland <joland@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:22:21 -07:00
Patrick Steinhardt	69814846ab	builtin/clone: propagate ref storage format to submodules When recursively cloning a repository with a non-default ref storage format, e.g. by passing the `--ref-format=` option, then only the top-level repository will end up using that ref storage format, and all recursively cloned submodules will instead use the default format. While mixed-format constellations are expected to work alright, the outcome still is somewhat surprising as we have essentially ignored the user's request. Fix this by propagating the requested ref format to cloned submodules. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:21:39 -07:00
Patrick Steinhardt	5ac781ad62	builtin/submodule: allow cloning with different ref storage format As submodules are proper self-contained repositories, it is perfectly valid for them to have a different ref storage format than their parent repository. There is no obvious way for users to ask for the ref storage format when initializing submodules though. Whether the setup of such mixed-ref-storage-format constellations is all that useful remains to be seen. But there is no good reason to not expose such an option, and we will require it in a subsequent patch. Introduce a new `--ref-format=` option for git-submodule(1) that allows the user to pick the ref storage format. This option will also be used in a subsequent commit, where we start to propagate the same flag from git-clone(1) to cloning submodules with the `--recursive` switch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:20:49 -07:00
Patrick Steinhardt	d9ab8788e1	git-submodule.sh: break overly long command lines For most of the subcommands of git-submodule(1), we end up passing a bunch of arguments to the submodule helper. This quickly leads to overly long lines, where it becomes hard to spot what has changed when one needs to modify them. Break up these lines into one argument per line, similarly to how it is done for the "clone" subcommand already. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:20:48 -07:00
Patrick Steinhardt	6ce8ffe30e	transport: mark more tests leak-free After fixing a transport leak, a few more tests have become leak-free. Mark them as such. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-08 09:16:21 -07:00
Junio C Hamano	448d51d549	transport: fix leak with transport helper URLs Transport URLs can be prefixed with "foo::", which would tell us that the transport uses a remote helper called "foo". We extract the helper name by `xstrndup()`ing the prefix before the double-colons, but never free that string. Fix this leak by assigning the result to a separate local variable that we can then free upon returning. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-07 17:38:31 -07:00
Junio C Hamano	92a29c2c39	Merge branch 'ps/refs-wo-the-repository' into ps/config-wo-the-repository * ps/refs-wo-the-repository: refs/reftable: stop using `the_repository` refs/packed: stop using `the_repository` refs/files: stop using `the_repository` refs/files: stop using `the_repository` in `parse_loose_ref_contents()` refs: stop using `the_repository`	2024-08-07 14:13:20 -07:00
Junio C Hamano	90b801d8ff	Merge branch 'ps/leakfixes-part-3' into ps/leakfixes-part-4 * ps/leakfixes-part-3: (24 commits) commit-reach: fix trivial memory leak when computing reachability convert: fix leaking config strings entry: fix leaking pathnames during delayed checkout object-name: fix leaking commit list items t/test-repository: fix leaking repository builtin/credential-cache: fix trivial leaks builtin/worktree: fix leaking derived branch names builtin/shortlog: fix various trivial memory leaks builtin/rerere: fix various trivial memory leaks builtin/credential-store: fix leaking credential builtin/show-branch: fix several memory leaks builtin/rev-parse: fix memory leak with `--parseopt` builtin/stash: fix various trivial memory leaks builtin/remote: fix various trivial memory leaks builtin/remote: fix leaking strings in `branch_list` builtin/ls-remote: fix leaking `pattern` strings builtin/submodule--helper: fix leaking buffer in `is_tip_reachable` builtin/submodule--helper: fix leaking clone depth parameter builtin/name-rev: fix various trivial memory leaks builtin/describe: fix trivial memory leak when describing blob ...	2024-08-06 12:40:41 -07:00
Taylor Blau	fcb2205b77	midx: implement support for writing incremental MIDX chains Now that the rest of the MIDX subsystem and relevant callers have been updated to learn about how to read and process incremental MIDX chains, let's finally update the implementation in `write_midx_internal()` to be able to write incremental MIDX chains. This new feature is available behind the `--incremental` option for the `multi-pack-index` builtin, like so: $ git multi-pack-index write --incremental The implementation for doing so is relatively straightforward, and boils down to a handful of different kinds of changes implemented in this patch: - The `compute_sorted_entries()` function is taught to reject objects which appear in any existing MIDX layer. - Functions like `write_midx_revindex()` are adjusted to write pack_order values which are offset by the number of objects in the base MIDX layer. - The end of `write_midx_internal()` is adjusted to move non-incremental MIDX files when necessary (i.e. when creating an incremental chain with an existing non-incremental MIDX in the repository). There are a handful of other changes that are introduced, like new functions to clear incremental MIDX files that are unrelated to the current chain (using the same "keep_hash" mechanism as in the non-incremental case). The tests explicitly exercising the new incremental MIDX feature are relatively limited for two reasons: 1. Most of the "interesting" behavior is already thoroughly covered in t5319-multi-pack-index.sh, which handles the core logic of reading objects through a MIDX. The new tests in t5334-incremental-multi-pack-index.sh are mostly focused on creating and destroying incremental MIDXs, as well as stitching their results together across layers. 2. A new GIT_TEST environment variable is added called "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the entire test suite to write incremental MIDXs after repacking when combined with the "GIT_TEST_MULTI_PACK_INDEX" variable. This exercises the long tail of other interesting behavior that is defined implicitly throughout the rest of the CI suite. It is likewise added to the linux-TEST-vars job. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:39 -07:00
Taylor Blau	147c3f6740	t/t5313-pack-bounds-checks.sh: prepare for sub-directories Prepare for sub-directories to appear in $GIT_DIR/objects/pack by adjusting the copy, remove, and chmod invocations to perform their behavior recursively. This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d directory which will be added in a following commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:39 -07:00
Taylor Blau	9552c3595a	t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Two years ago, commit `ff1e653c8e` (midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new environment variable which caused the test suite to write MIDX bitmaps after any 'git repack' invocation. At the time, this was done to help flush out any bugs with MIDX bitmaps that weren't explicitly covered in the t5326-multi-pack-bitmap.sh script. Two years later, that flag has served us well and is no longer providing meaningful coverage, as the script in t5326 has matured substantially and covers many more interesting cases than it did back when `ff1e653c8e` was originally written. Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable as it is no longer serving a useful purpose. More importantly, removing this variable clears the way for us to introduce a new one to help similarly flush out bugs related to incremental MIDX chains. Because these incremental MIDX chains are (for now) incompatible with MIDX bitmaps, we cannot have both. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Taylor Blau	3592796d0a	midx: implement verification support for incremental MIDXs Teach the verification implementation used by `git multi-pack-index verify` to perform verification for incremental MIDX chains by independently validating each layer within the chain. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Taylor Blau	b80236d0e3	midx: support reading incremental MIDX chains Now that the MIDX machinery's internals have been taught to understand incremental MIDXs over the previous handful of commits, the MIDX machinery itself can begin reading incremental MIDXs. (Note that while the on-disk format for incremental MIDXs has been defined, the writing end has not been implemented. This will take place in the commit after next.) The core of this change involves following the order specified in the MIDX chain in reverse and opening up MIDXs in the chain one-by-one, adding them to the previous layer's `->base_midx` pointer at each step. In order to implement this, the `load_multi_pack_index()` function is taught to call a new `load_multi_pack_index_chain()` function if loading a non-incremental MIDX failed via `load_multi_pack_index_one()`. When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in the file one-by-one and dispatches calls to `load_multi_pack_index_one()` to read each layer of the MIDX chain. When a layer was successfully read, it is added to the MIDX chain by calling `add_midx_to_chain()` which validates the contents of the `BASE` chunk, performs some bounds checks on the number of combined packs and objects, and attaches the new MIDX by assigning its `base_midx` pointer to the existing part of the chain. As a supplement to this, introduce a new mode in the test-read-midx test-tool which allows us to read the information for a specific MIDX in the chain by specifying its trailing checksum via the command-line arguments like so: $ test-tool read-midx .git/objects [checksum] Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Taylor Blau	97fd770ea1	midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs The function `midx_fanout_add_midx_fanout()` is used to help construct the fanout table when generating a MIDX by reusing data from an existing MIDX. Prepare this function to work with incremental MIDXs by making a few changes: - The bounds checks need to be adjusted to start object lookups taking into account the number of objects in the previous MIDX layer (i.e., by starting the lookups at position `m->num_objects_in_base` instead of position 0). - Likewise, the bounds checks need to end at `m->num_objects_in_base` objects after `m->num_objects`. - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier MIDX layers when dealing with an incremental MIDX chain by calling itself when given a MIDX with a non-NULL `base_midx`. Note that after `0c5a62f14b` (midx-write.c: do not read existing MIDX with `packs_to_include`, 2024-06-11), we do not use this function with an existing MIDX (incremental or not) when generating a MIDX with --stdin-packs, and likewise for incremental MIDXs. But it is still used when adding the fanout table from an incremental MIDX when generating a non-incremental MIDX (without --stdin-packs, of course). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Taylor Blau	b31f2aac56	midx: teach `midx_preferred_pack()` about incremental MIDXs The function `midx_preferred_pack()` is used to determine the identity of the preferred pack, which is the identity of a unique pack within the MIDX which is used as a tie-breaker when selecting from which pack to represent an object that appears in multiple packs within the MIDX. Historically we have said that the MIDX's preferred pack has the unique property that all objects from that pack are represented in the MIDX. But that isn't quite true: a more precise statement would be that all objects from that pack which appear in the MIDX are selected from that pack. This helps us extend the concept of preferred packs across a MIDX chain, where some object(s) in the preferred pack may appear in other packs in an earlier MIDX layer, in which case those object(s) will not appear in a subsequent MIDX layer from either the preferred pack or any other pack. Extend the concept of preferred packs by using the pack which represents the object at the first position in MIDX pseudo-pack order belonging to the current MIDX layer (i.e., at position 'm->num_objects_in_base'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Taylor Blau	853165c50a	midx: teach `midx_contains_pack()` about incremental MIDXs Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle has been cleaned up, teach the former about how to operate in an incremental MIDX-aware world in a similar fashion as in previous commits. Instead of using either of the two `midx_for_object()` or `midx_for_pack()` helpers, this function is split into two: one that determines whether a pack is contained in a single MIDX, and another which calls the former in a loop over all MIDXs. This approach does not require that we change any of the implementation in what is now `midx_contains_pack_1()` as it still operates over a single MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:37 -07:00
Taylor Blau	5d0ee3f675	midx: remove unused `midx_locate_pack()` Commit `307d75bbe6` (midx: implement `midx_locate_pack()`, 2023-12-14) introduced `midx_locate_pack()`, which was described at the time as a complement to the function `midx_contains_pack()` which allowed callers to determine where in the MIDX lexical order a pack appeared, as opposed to whether or not it was simply contained. `307d75bbe6` suggests that future patches would be added which would introduce callers for this new function, but none ever were, meaning the function has gone unused since its introduction. Clean this up by in effect reverting `307d75bbe6`, which removes the unused functions and inlines its definition back into `midx_contains_pack()`. (Looking back through the list archives when `307d75bbe6` was written, this was in preparation for this[1] patch from back when we had the concept of "disjoint" packs while developing multi-pack verbatim reuse. That concept was abandoned before the series was merged, but I never dropped what would become `307d75bbe6` from the series, leading to the state prior to this commit). [1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:37 -07:00
Taylor Blau	3b00e35108	midx: teach `fill_midx_entry()` about incremental MIDXs In a similar fashion as previous commits, teach the `fill_midx_entry()` function to work in a incremental MIDX-aware fashion. This function, unlike others which accept an index into either the lexical order of objects or packs, takes in an object_id, and attempts to fill a caller-provided 'struct pack_entry' with the remaining pieces of information about that object from the MIDX. The function uses `bsearch_midx()` which fills out the frame-local 'pos' variable, recording the given object_id's lexical position within the MIDX chain, if found (if no matching object ID was found, we'll return immediately without filling out the `pack_entry` structure). Once given that position, we jump back through the `->base_midx` pointer to ensure that our `m` points at the MIDX layer which contains the given object_id (and not an ancestor or descendant of it in the chain). Note that we can drop the bounds check "if (pos >= m->num_objects)" because `midx_for_object()` performs this check for us. After that point, we only need to make two special considerations within this function: - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()` is a position in the concatenated lexical order of packs, so we must ensure that we subtract `m->num_packs_in_base` before accessing the MIDX-local `packs` array. - Second, we must avoid translating the `pos` back to a MIDX-local index, since we use it as an argument to `nth_midxed_offset()` which expects a position relative to the concatenated lexical order of objects. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:37 -07:00
Taylor Blau	df7ede83be	midx: teach `nth_midxed_offset()` about incremental MIDXs In a similar fashion as in previous commits, teach the function `nth_midxed_offset()` about incremental MIDXs. The given object `pos` is used to find the containing MIDX, and translated back into a MIDX-local position by assigning the return value of `midx_for_object()` to it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:37 -07:00
Taylor Blau	88f309e095	midx: teach `bsearch_midx()` about incremental MIDXs Now that the special cases callers of `bsearch_midx()` have been dealt with, teach `bsearch_midx()` to handle incremental MIDX chains. The incremental MIDX-aware version of `bsearch_midx()` works by repeatedly searching for a given OID in each layer along the `->base_midx` pointer, stopping either when an exact match is found, or the end of the chain is reached. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:37 -07:00
Taylor Blau	3f5f1cff92	midx: introduce `bsearch_one_midx()` The `bsearch_midx()` function will be extended in a following commit to search for the location of a given object ID across all MIDXs in a chain (or the single non-chain MIDX if no chain is available). While most callers will naturally want to use the updated `bsearch_midx()` function, there are a handful of special cases that will want finer control and will only want to search through a single MIDX. For instance, the object abbreviation code, which cares about object IDs near to where we'd expect to find a match in a MIDX. In that case, we want to look at the nearby matches in each layer of the MIDX chain, not just a single one). Split the more fine-grained control out into a separate function called `bsearch_one_midx()` which searches only a single MIDX. At present both `bsearch_midx()` and `bsearch_one_midx()` have identical behavior, but the following commit will rewrite the former to be aware of incremental MIDXs for the remaining non-special case callers. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	60750e1eb9	midx: teach `nth_bitmapped_pack()` about incremental MIDXs In a similar fashion as in previous commits, teach the function `nth_bitmapped_pack()` about incremental MIDXs by translating the given `pack_int_id` from the concatenated lexical order to a MIDX-local lexical position. When accessing the containing MIDX's array of packs, use the local pack ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset when accessing the data within that chunk. (Note that the both the call to prepare_midx_pack() and the assignment of bp->pack_int_id both care about the global pack_int_id, so avoid shadowing the given 'pack_int_id' parameter). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	26afb5afa1	midx: teach `nth_midxed_object_oid()` about incremental MIDXs The function `nth_midxed_object_oid()` returns the object ID for a given object position in the MIDX lexicographic order. Teach this function to instead operate over the concatenated lexicographic order defined in an earlier step so that it is able to be used with incremental MIDXs. To do this, we need to both (a) adjust the bounds check for the given 'n', as well as record the MIDX-local position after chasing the `->base_midx` pointer to find the MIDX which contains that object. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	1820bd878c	midx: teach `prepare_midx_pack()` about incremental MIDXs The function `prepare_midx_pack()` is part of the midx.h API and loads the pack identified by the MIDX-local 'pack_int_id'. This patch prepares that function to be aware of an incremental MIDX world. To do this, introduce the second of the two general purpose helpers mentioned in the previous commit. This commit introduces `midx_for_pack()`, which is the pack-specific analog of `midx_for_object()`, and works in the same fashion. Like `midx_for_object()`, this function chases down the '->base_midx' field until it finds the MIDX layer within the chain that contains the given pack. Use this function within `prepare_midx_pack()` so that the `pack_int_id` it expects is now relative to the entire MIDX chain, and that it prepares the given pack in the appropriate MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	19419821ba	midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs The function `nth_midxed_pack_int_id()` takes in a object position in MIDX lexicographic order and returns an identifier of the pack from which that object was selected in the MIDX. Currently, the given object position is an index into the lexicographic order of objects in a single MIDX. Change this position to instead refer into the concatenated lexicographic order of all MIDXs in a MIDX chain. This has two visible effects within the implementation of `prepare_midx_pack()`: - First, the given position is now an index into the concatenated lexicographic order of all MIDXs in the order in which they appear in the MIDX chain. - Second the pack ID returned from this function is now also in the concatenated order of packs among all layers of the MIDX chain in the same order that they appear in the MIDX chain. To do this, introduce the first of two general purpose helpers, this one being `midx_for_object()`. `midx_for_object()` takes a double pointer to a `struct multi_pack_index` as well as an object `pos` in terms of the entire MIDX chain[^1]. The function chases down the '->base_midx' field until it finds the MIDX layer within the chain that contains the given object. It then: - modifies the double pointer to point to the containing MIDX, instead of the tip of the chain, and - returns the MIDX-local position[^2] at which the given object can be found. Use this function within `nth_midxed_pack_int_id()` so that the `pos` it expects is now relative to the entire MIDX chain, and that it returns the appropriate pack position for that object. [^1]: As a reminder, this means that the object is identified among the objects contained in all layers of the incremental MIDX chain, not any particular layer. For example, consider MIDX chain with two individual MIDXs, one with 4 objects and another with 3 objects. If the MIDX with 4 objects appears earlier in the chain, then asking for object 6 would return the second object in the MIDX with 3 objects. [^2]: Building on the previous example, asking for object 6 in a MIDX chain with (4, 3) objects, respectively, this would set the double pointer to point at the MIDX containing three objects, and would return an index to the second object within that MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	2678a73009	midx: add new fields for incremental MIDX chains The incremental MIDX chain feature is designed around the idea of indexing into a concatenated lexicographic ordering of object IDs present in the MIDX. When given an object position, the MIDX machinery needs to be able to locate both (a) which MIDX layer contains the given object, and (b) at what position within that MIDX layer that object appears. To do this, three new fields are added to the `struct multi_pack_index`: - struct multi_pack_index *base_midx; - uint32_t num_objects_in_base; - uint32_t num_packs_in_base; These three fields store the pieces of information suggested by their respective field names. In turn, the `num_objects_in_base` and `num_packs_in_base` fields are used to crawl backwards along the `base_midx` pointer to locate the appropriate position for a given object within the MIDX that contains it. The following commits will update various parts of the MIDX machinery (as well as their callers from outside of midx.c and midx-write.c) to be aware and make use of these fields when performing object lookups. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:36 -07:00
Taylor Blau	6eb1a7d7b0	Documentation: describe incremental MIDX format Prepare to implement incremental multi-pack indexes (MIDXs) over the next several commits by first describing the relevant prerequisites (like a new chunk in the MIDX format, the directory structure for incremental MIDXs, etc.) The format is described in detail in the patch contents below, but the high-level description is as follows. Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and each `*.midx` within that directory has a single "parent" MIDX, which is the MIDX layer immediately before it in the MIDX chain. The chain order resides in a file 'multi-pack-index-chain' in the same directory. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:35 -07:00
Junio C Hamano	6caa96c204	t3206: test_when_finished before dirtying operations, not after Many existing tests in this script perform operation(s) and then use test_when_finished to define how to undo the effect of the operation(s). This is backwards. When your operation(s) fail before you manage to successfully call test_when_finished (remember, that these commands must be all &&-chained, so a failure of an earlier operation mean your test_when_finished may not be executed at all). You must establish how to clean up your mess with test_when_finished before you create the mess to be cleaned up. Also make sure that the body of test_when_finished deals with case where the cruft it wants to remove failed to be created, by using "rm -f" (instead of "rm") to remove potential cruft files, and having "\|\| :" after "git notes remove" to remove potential cruft notes---both of these by default fail when asked to remove something that does not exist, instead of being silently idempotent no-ops. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 10:05:05 -07:00

... 14 15 16 17 18 ...

75032 Commits