Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.
While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.
Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.
A few things worth noting:
- In the change to builtin/fast-import.c::stream_blob(), we can see
that by removing the explicit reference to
'the_hash_algo->unsafe_init_fn()', we are hardened against the
hashfile API changing away from the_hash_algo (or its unsafe
variant) in the future.
- The bulk-checkin code no longer needs to explicitly zero-initialize
the hashfile_checkpoint, since it is now done as a result of calling
'hashfile_checkpoint_init()'.
- Also in the bulk-checkin code, we add an additional call to
prepare_to_stream() outside of the main loop in order to initialize
'state->f' so we know which hash function implementation to use when
calling 'hashfile_checkpoint_init()'.
This is OK, since subsequent 'prepare_to_stream()' calls are noops.
However, we only need to call 'prepare_to_stream()' when we have the
HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
'prepare_to_stream()' does not assign 'state->f', so we have nothing
to initialize.
- Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
appropriately guarded.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.
Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user. The help text now goes to the
standard output stream for them.
These are the bog standard "if we got only '-h', then that is a
request for help" callers. Their
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(message);
are simply replaced with
show_usage_and_exit_if_asked(argc, argv, message);
With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier "csum-file checksum does not have to be computed with
sha1dc" topic had a few code paths that had initialized an
implementation of a hash function to be used by an unmatching hash
by mistake, which have been corrected.
* ps/weak-sha1-for-tail-sum-fix:
ci: exercise unsafe OpenSSL backend
builtin/fast-import: fix segfault with unsafe SHA1 backend
bulk-checkin: fix segfault with unsafe SHA1 backend
Same as with the preceding commit, git-fast-import(1) is using the safe
variant to initialize a hashfile checkpoint. This leads to a segfault
when passing the checkpoint into the hashfile subsystem because it would
use the unsafe variants instead:
++ git --git-dir=R/.git fast-import --big-file-threshold=1
AddressSanitizer:DEADLYSIGNAL
=================================================================
==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0)
==577126==The signal is caused by a READ memory access.
==577126==Hint: address points to the zero page.
#0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
#1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
#2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
#3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
#4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2
#5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3
#6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5
#7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4
#8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4
#9 0x555555b1f493 in run_builtin ../git.c:480:11
#10 0x555555b1bfef in handle_builtin ../git.c:740:9
#11 0x555555b1e6f4 in run_argv ../git.c:807:4
#12 0x555555b1b87a in cmd_main ../git.c:947:19
#13 0x5555561649e6 in main ../common-main.c:64:11
#14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#16 0x555555772c84 in _start (git+0x21ec84)
==577126==Register values:
rax = 0x0000511000000cc0 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000
rdi = 0x0000000000000000 rsi = 0x00005070000009c0 rbp = 0x00005070000009c0 rsp = 0x00007fffffff5b30
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30
r12 = 0x0000000000000000 r13 = 0x00007fffffff6b60 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
==577126==ABORTING
./test-lib.sh: line 1039: 577126 Aborted git --git-dir=R/.git fast-import --big-file-threshold=1 < input
error: last command exited with $?=134
not ok 167 - R: blob bigger than threshold
The segfault is only exposed in case the unsafe and safe backends are
different from one another.
Fix the issue by initializing the context with the unsafe SHA1 variant.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
Yet another "pass the repository through the callchain" topic.
* kn/midx-wo-the-repository:
midx: inline the `MIDX_MIN_SIZE` definition
midx: pass down `hash_algo` to functions using global variables
midx: pass `repository` to `load_multi_pack_index`
midx: cleanup internal usage of `the_repository` and `the_hash_algo`
midx-write: pass down repository to `write_midx_file[_only]`
write-midx: add repository field to `write_midx_context`
midx-write: use `revs->repo` inside `read_refs_snapshot`
midx-write: pass down repository to static functions
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
"git fast-import" learned to reject paths with ".." and "." as
their components to avoid creating invalid tree objects.
* en/fast-import-verify-path:
t9300: test verification of renamed paths
fast-import: disallow more path components
fast-import: disallow "." and ".." path components
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The migration procedure between two ref backends has been optimized.
* ps/ref-backend-migration-optim:
reftable: rename scratch buffer
refs: adapt `initial_transaction` flag to be unsigned
reftable/block: optimize allocations by using scratch buffer
reftable/block: rename `block_writer::buf` variable
reftable/writer: optimize allocations by using a scratch buffer
refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
refs: skip collision checks in initial transactions
refs: use "initial" transaction semantics to migrate refs
refs/files: support symbolic and root refs in initial transaction
refs: introduce "initial" transaction flag
refs/files: move logic to commit initial transaction
refs: allow passing flags when setting up a transaction
The variables `packed_git_window_size` and `packed_git_limit` are global
config variables used in the `packfile.c` file. Since it is only used in
this file, let's change it from being a global config variable to a
local variable for the subsystem.
With this, we rid `packfile.c` from all global variable usage and this
means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from
the file.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.
We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.
We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of just disallowing '.' and '..', make use of verify_path() to
ensure that fast-import will disallow anything we wouldn't allow into
the index, such as anything under .git/, .gitmodules as a symlink, or
a dos drive prefix on Windows.
Since a few fast-export and fast-import tests that tried to stress-test
the correct handling of quoting relied on filenames that fail
is_valid_win32_path(), such as spaces or periods at the end of filenames
or backslashes within the filename, turn off core.protectNTFS for those
tests to ensure they keep passing.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fast-import" can be tricked into a replace ref that maps an
object to itself, which is a useless thing to do.
* en/fast-import-avoid-self-replace:
fast-import: avoid making replace refs point to themselves
If a user specified e.g.
M 100644 :1 ../some-file
then fast-import previously would happily create a git history where
there is a tree in the top-level directory named "..", and with a file
inside that directory named "some-file". The top-level ".." directory
causes problems. While git checkout will die with errors and fsck will
report hasDotdot problems, the user is going to have problems trying to
remove the problematic file. Simply avoid creating this bad history in
the first place.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.
Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like
git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import
and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself. For example:
$ git show-ref | grep replace
fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264
Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:
$ git log
fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing. Unless users specify
--quiet, warn them when we delete such a replace ref.
Two notes about this patch:
* We are not ignoring the problematic update of the replace ref
(turning it into a no-op), we are replacing the update with a delete.
The logic here is that if the repository had a value for the replace
ref before fast-import was run, and the replace ref was explicitly
named in the fast-import stream, we don't want the replace ref to be
left with a pre-fast-import value.
* While loops with more than one element (e.g. refs/replace/A points
to B, and refs/replace/B points to A) are possible, they seem much
less plausible. It is pretty easy to create a sequence of
git-filter-repo commands that will trigger a self-referencing replace
ref, but I do not know how to trigger a scenario with a cycle length
greater than 1.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The find_sha1_pack() function has a few problems:
- it's badly named, since it works with any object hash
- it takes the hash as a bare pointer rather than an object_id struct
We can fix both of these easily, as all callers actually have a real
object_id anyway.
I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).
So let's also add an explanatory comment above the declaration that may
point people in the right direction.
I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Instead of including USE_THE_REPOSITORY_VARIABLE by default on every
builtin, remove it from builtin.h and add it to all the builtins that
include builtin.h (by definition, that means all builtins/*.c).
Also, remove the include statement for repository.h since it gets
brought in through builtin.h.
The next step will be to migrate each builtin
from having to use the_repository.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to reduce the usage of the global the_repository, add a
parameter to builtin functions that will get passed a repository
variable.
This commit uses UNUSED on most of the builtin functions, as subsequent
commits will modify the actual builtins to pass the repository parameter
down.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of API functions that implicitly depend on the_repository
object in the config subsystem has been rewritten to pass a
repository object through the callchain.
* ps/config-wo-the-repository:
config: hide functions using `the_repository` by default
global: prepare for hiding away repo-less config functions
config: don't depend on `the_repository` with branch conditions
config: don't have setters depend on `the_repository`
config: pass repo to functions that rename or copy sections
config: pass repo to `git_die_config()`
config: pass repo to `git_config_get_expiry_in_days()`
config: pass repo to `git_config_get_expiry()`
config: pass repo to `git_config_get_max_percent_split_change()`
config: pass repo to `git_config_get_split_index()`
config: pass repo to `git_config_get_index_threads()`
config: expose `repo_config_clear()`
config: introduce missing setters that take repo as parameter
path: hide functions using `the_repository` by default
path: stop relying on `the_repository` in `worktree_git_path()`
path: stop relying on `the_repository` when reporting garbage
hooks: remove implicit dependency on `the_repository`
editor: do not rely on `the_repository` for interactive edits
path: expose `do_git_common_path()` as `repo_common_pathv()`
path: expose `do_git_path()` as `repo_git_pathv()`
Refactor `git_die_config()` to accept a `struct repository` such that we
can get rid of the implicit dependency on `the_repository`. Rename the
function accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to
derive the hash function that shall be used. Require callers to pass in
the hash algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Updates to symbolic refs can now be made as a part of ref
transaction.
* kn/ref-transaction-symref:
refs: remove `create_symref` and associated dead code
refs: rename `refs_create_symref()` to `refs_update_symref()`
refs: use transaction in `refs_create_symref()`
refs: add support for transactional symref updates
refs: move `original_update_refname` to 'refs.c'
refs: support symrefs in 'reference-transaction' hook
files-backend: extract out `create_symref_lock()`
refs: accept symref values in `ref_transaction_update()`
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `ref_transaction_update()` obtains ref information and
flags to create a `ref_update` and add them to the transaction at hand.
To extend symref support in transactions, we need to also accept the
old and new ref targets and process it. This commit adds the required
parameters to the function and modifies all call sites.
The two parameters added are `new_target` and `old_target`. The
`new_target` is used to denote what the reference should point to when
the transaction is applied. Some functions allow this parameter to be
NULL, meaning that the reference is not changed.
The `old_target` denotes the value the reference must have before the
update. Some functions allow this parameter to be NULL, meaning that the
old value of the reference is not checked.
We also update the internal function `ref_transaction_add_update()`
similarly to take the two new parameters.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The former is somewhat imprecise. The latter became out of sync with the
behavior in e814c39c2f (fast-import: refactor parsing of spaces,
2014-06-18).
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
NUL cannot appear in paths. Even disregarding filesystem path
limitations, the tree object format delimits with NUL, so such a path
cannot be encoded by Git.
When a quoted path is unquoted, it could possibly contain NUL from
"\000". Forbid it so it isn't truncated.
fast-import still has other issues with NUL, but those will be addressed
later.
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strbuf in `note_change_n` is to copy the remainder of `p` before
potentially invalidating it when reading the next line. However, `p` is
not used after that point. It has been unused since the function was
created in a8dd2e7d2b (fast-import: Add support for importing commit
notes, 2009-10-09) and looks to be a fossil from adapting
`file_change_m`. Remove it.
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since filerename was added in f39a946a1f (Support wholesale
directory renames in fast-import, 2007-07-09) and filecopy in b6f3481bb4
(Teach fast-import to recursively copy files/directories, 2007-07-15),
both have produced an error when the destination path is empty. Later,
when support for targeting the root directory with an empty string was
added in 2794ad5244 (fast-import: Allow filemodify to set the root,
2010-10-10), this had the effect of allowing the quoted empty string
(`""`), but forbidding its unquoted variant (``). This seems to have
been intended as simple data validation for parsing two paths, rather
than a syntax restriction, because it was not extended to the other
operations.
All other occurrences of paths (in filemodify, filedelete, the source of
filecopy and filerename, and ls) allow both.
For most of this feature's lifetime, the documentation has not
prescribed the use of quoted empty strings. In e5959106d6
(Documentation/fast-import: put explanation of M 040000 <dataref> "" in
context, 2011-01-15), its documentation was changed from “`<path>` may
also be an empty string (`""`) to specify the root of the tree” to “The
root of the tree can be represented by an empty string as `<path>`”.
Thus, we should assume that some front-ends have depended on this
behavior.
Remove this restriction for the destination paths of filecopy and
filerename and change tests targeting the root to test `""` and ``.
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, one case would not write the path to the strbuf: when the
path is unquoted and at the end of the string. It was essentially
copy-on-write. However, with the logic simplification of the previous
commit, this case was eliminated and the strbuf is always populated.
Directly use the strbufs now instead of an alias.
Since this already changes all the lines that use the strbufs, rename
them from `uq` to be more descriptive. That they are unquoted is not
their most important property, so name them after what they carry.
Additionally, `file_change_m` no longer needs to copy the path before
reading inline data.
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Path parsing in fast-import is inconsistent and many unquoting errors
are suppressed or not checked.
<path> appears in the grammar in these places:
filemodify ::= 'M' SP <mode> (<dataref> | 'inline') SP <path> LF
filedelete ::= 'D' SP <path> LF
filecopy ::= 'C' SP <path> SP <path> LF
filerename ::= 'R' SP <path> SP <path> LF
ls ::= 'ls' SP <dataref> SP <path> LF
ls-commit ::= 'ls' SP <path> LF
and fast-import.c parses them in five different ways:
1. For filemodify and filedelete:
Try to unquote <path>. If it unquotes without errors, use the
unquoted version; otherwise, treat it as literal bytes to the end of
the line (including any number of SP).
2. For filecopy (source) and filerename (source):
Try to unquote <path>. If it unquotes without errors, use the
unquoted version; otherwise, treat it as literal bytes up to, but not
including, the next SP.
3. For filecopy (dest) and filerename (dest):
Like 1., but an unquoted empty string is forbidden.
4. For ls:
If <path> starts with `"`, unquote it and report parse errors;
otherwise, treat it as literal bytes to the end of the line
(including any number of SP).
5. For ls-commit:
Unquote <path> and report parse errors.
(It must start with `"` to disambiguate from ls.)
In the first three, any errors from trying to unquote a string are
suppressed, so a quoted string that contains invalid escapes would be
interpreted as literal bytes. For example, `"\xff"` would fail to
unquote (because hex escapes are not supported), and it would instead be
interpreted as the byte sequence '"', '\\', 'x', 'f', 'f', '"', which is
certainly not intended. Some front-ends erroneously use their language's
standard quoting routine instead of matching Git's, which could silently
introduce escapes that would be incorrectly parsed due to this and lead
to data corruption.
The documentation states “To use a source path that contains SP the path
must be quoted.”, so it is expected that some implementations depend on
spaces being allowed in paths in the final position. Thus we have two
documented ways to parse paths, so simplify the implementation to that.
Now we have:
1. `parse_path_eol` for filemodify, filedelete, filecopy (dest),
filerename (dest), ls, and ls-commit:
If <path> starts with `"`, unquote it and report parse errors;
otherwise, treat it as literal bytes to the end of the line
(including any number of SP).
2. `parse_path_space` for filecopy (source) and filerename (source):
If <path> starts with `"`, unquote it and report parse errors;
otherwise, treat it as literal bytes up to, but not including, the
next SP. It must be followed by SP.
There remain two special cases: The dest <path> in filecopy and rename
cannot be an unquoted empty string (this will be addressed subsequently)
and <path> in ls-commit must be quoted to disambiguate it from ls.
Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
Some functions in Git's source code follow the convention that returning
a negative value indicates a fatal error, e.g. repository corruption.
Let's use this convention in `repo_in_merge_bases()` to report when one
of the specified commits is missing (i.e. when `repo_parse_commit()`
reports an error).
Also adjust the callers of `repo_in_merge_bases()` to handle such
negative return values.
Note: As of this patch, errors are returned only if any of the specified
merge heads is missing. Over the course of the next patches, missing
commits will also be reported by the `paint_down_to_common()` function,
which is called by `repo_in_merge_bases_many()`, and those errors will
be properly propagated back to the caller at that stage.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use mem_pool_calloc() to get a zeroed buffer instead of zeroing it
ourselves. This makes the code clearer and less repetitive.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode. The two functions started out the same but have diverged
slightly. The version in fast-import changed mode to a uint16_t to
save memory. The version in tree-walk started erroring if no mode was
present.
As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h
Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c
This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The OpenSSL 3+ EVP API for SHA-* cannot support our prior use cases
supported by other SHA-* implementations. It has the following
differences:
1. ->init_fn is required before all use
2. struct assignments don't work and requires ->clone_fn
3. can't support ->update_fn after ->final_*fn
While fixing cases 1 and 2 is merely the matter of calling ->init_fn and
->clone_fn as appropriate, fixing case 3 requires calling ->final_*fn on
a temporary context that's cloned from the primary context.
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/ZPCL11k38PXTkFga@debian.me/
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Fixes: 3e440ea0ab ("sha256: avoid functions deprecated in OpenSSL 3+")
Fixes: bda9c12073 ("avoid SHA-1 functions deprecated in OpenSSL 3+")
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>