Commit Graph

82 Commits

Author SHA1 Message Date
246569bf83 Merge branch 'ps/hash-cleanup'
Further code clean-up on the use of hash functions.  Now the
context object knows what hash function it is working with.

* ps/hash-cleanup:
  global: adapt callers to use generic hash context helpers
  hash: provide generic wrappers to update hash contexts
  hash: stop typedeffing the hash context
  hash: convert hashing context to a structure
2025-02-10 10:18:31 -08:00
b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
caf17423d3 Merge branch 'tb/unsafe-hash-cleanup'
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.

* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-02-03 10:23:32 -08:00
0578f1e66a global: adapt callers to use generic hash context helpers
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:11 -08:00
7346e340f1 hash: stop typedeffing the hash context
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.

Drop the typedef and adapt all callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:10 -08:00
0cbcba5455 Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanup
* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-01-31 10:05:46 -08:00
a8dd3821fe csum-file: introduce hashfile_checkpoint_init()
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.

While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.

Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.

A few things worth noting:

  - In the change to builtin/fast-import.c::stream_blob(), we can see
    that by removing the explicit reference to
    'the_hash_algo->unsafe_init_fn()', we are hardened against the
    hashfile API changing away from the_hash_algo (or its unsafe
    variant) in the future.

  - The bulk-checkin code no longer needs to explicitly zero-initialize
    the hashfile_checkpoint, since it is now done as a result of calling
    'hashfile_checkpoint_init()'.

  - Also in the bulk-checkin code, we add an additional call to
    prepare_to_stream() outside of the main loop in order to initialize
    'state->f' so we know which hash function implementation to use when
    calling 'hashfile_checkpoint_init()'.

    This is OK, since subsequent 'prepare_to_stream()' calls are noops.
    However, we only need to call 'prepare_to_stream()' when we have the
    HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
    'prepare_to_stream()' does not assign 'state->f', so we have nothing
    to initialize.

  - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
    appropriately guarded.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
7653e9af9b pack-write: pass hash_algo to write_idx_file()
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.

Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
8244d01de6 pack-write: pass hash_algo to fixup_pack_header_footer()
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
7b39a128c8 Merge branch 'ps/the-repository'
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.

* ps/the-repository:
  match-trees: stop using `the_repository`
  graph: stop using `the_repository`
  add-interactive: stop using `the_repository`
  tmp-objdir: stop using `the_repository`
  resolve-undo: stop using `the_repository`
  credential: stop using `the_repository`
  mailinfo: stop using `the_repository`
  diagnose: stop using `the_repository`
  server-info: stop using `the_repository`
  send-pack: stop using `the_repository`
  serve: stop using `the_repository`
  trace: stop using `the_repository`
  pager: stop using `the_repository`
  progress: stop using `the_repository`
2025-01-21 08:44:54 -08:00
98422943f0 Merge branch 'ps/weak-sha1-for-tail-sum-fix'
An earlier "csum-file checksum does not have to be computed with
sha1dc" topic had a few code paths that had initialized an
implementation of a hash function to be used by an unmatching hash
by mistake, which have been corrected.

* ps/weak-sha1-for-tail-sum-fix:
  ci: exercise unsafe OpenSSL backend
  builtin/fast-import: fix segfault with unsafe SHA1 backend
  bulk-checkin: fix segfault with unsafe SHA1 backend
2025-01-01 09:21:14 -08:00
9218c0bfe1 bulk-checkin: fix segfault with unsafe SHA1 backend
In 1b9e9be8b4 (csum-file.c: use unsafe SHA-1 implementation when
available, 2024-09-26) we have converted our `struct hashfile` to use
the unsafe SHA1 backend, which results in a significant speedup. One
needs to be careful with how to use that structure now though because
callers need to consistently use either the safe or unsafe variants of
SHA1, as otherwise one can easily trigger corruption.

As it turns out, we have one inconsistent usage in our tree because we
directly initialize `struct hashfile_checkpoint::ctx` with the safe
variant of SHA1, but end up writing to that context with the unsafe
ones. This went unnoticed so far because our CI systems do not exercise
different hash functions for these two backends, and consequently safe
and unsafe variants are equivalent. But when using SHA1DC as safe and
OpenSSL as unsafe backend this leads to a crash an t1050:

    ++ git -c core.compression=0 add large1
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==1367==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x507000000db0 sp 0x7fffffff5690 T0)
    ==1367==The signal is caused by a READ memory access.
    ==1367==Hint: address points to the zero page.
        #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
        #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
        #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
        #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
        #4 0x555555b9905d in deflate_blob_to_pack ../bulk-checkin.c:286:4
        #5 0x555555b98ae9 in index_blob_bulk_checkin ../bulk-checkin.c:362:15
        #6 0x555555ddab62 in index_blob_stream ../object-file.c:2756:9
        #7 0x555555dda420 in index_fd ../object-file.c:2778:9
        #8 0x555555ddad76 in index_path ../object-file.c:2796:7
        #9 0x555555e947f3 in add_to_index ../read-cache.c:771:7
        #10 0x555555e954a4 in add_file_to_index ../read-cache.c:804:9
        #11 0x5555558b5c39 in add_files ../builtin/add.c:355:7
        #12 0x5555558b412e in cmd_add ../builtin/add.c:578:18
        #13 0x555555b1f493 in run_builtin ../git.c:480:11
        #14 0x555555b1bfef in handle_builtin ../git.c:740:9
        #15 0x555555b1e6f4 in run_argv ../git.c:807:4
        #16 0x555555b1b87a in cmd_main ../git.c:947:19
        #17 0x5555561649e6 in main ../common-main.c:64:11
        #18 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #19 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #20 0x555555772c84 in _start (git+0x21ec84)

    ==1367==Register values:
    rax = 0x0000511000001080  rbx = 0x0000000000000000  rcx = 0x000000000000000c  rdx = 0x0000000000000000
    rdi = 0x0000000000000000  rsi = 0x0000507000000db0  rbp = 0x0000507000000db0  rsp = 0x00007fffffff5690
     r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x00007ffff7a01a30
    r12 = 0x0000000000000000  r13 = 0x00007fffffff6b38  r14 = 0x00007ffff7ffd000  r15 = 0x00005555563b9910
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
    ==1367==ABORTING
    ./test-lib.sh: line 1023:  1367 Aborted                 git $config add large1
    error: last command exited with $?=134
    not ok 4 - add with -c core.compression=0

Fix the issue by using the unsafe variant instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:46:29 -08:00
727c71a112 tmp-objdir: stop using the_repository
Stop using `the_repository` in the "tmp-objdir" subsystem by passing
in the repostiroy when creating a new temporary object directory.

While we could trivially update the caller to pass in the hash algorithm
used by the index itself, we instead pass in `the_hash_algo`. This is
mostly done to stay consistent with the rest of the code in that file,
which isn't prepared to handle arbitrary repositories, either.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:31 -08:00
41f43b8243 global: mark code units that generate warnings with -Wsign-compare
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
a3673f4898 environment: make get_object_directory() accept a repository
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12 10:15:39 -07:00
c81dcf630c bulk-checkin: fix leaking state TODO
When flushing a bulk-checking to disk we also reset the `struct
bulk_checkin_packfile` state. But while we free some of its members,
others aren't being free'd, leading to memory leaks:

  - The temporary packfile name is not getting freed.

  - The `struct hashfile` only gets freed in case we end up calling
    `finalize_hashfile()`. There are code paths though where that is not
    the case, namely when nothing has been written. For this, we need to
    make `free_hashfile()` public.

Fix those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:57 -07:00
e7da938570 global: introduce USE_THE_REPOSITORY_VARIABLE macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
3df51ea0a5 Merge branch 'eb/limit-bulk-checkin-to-blobs'
The "streaming" interface used for bulk-checkin codepath has been
narrowed to take only blob objects for now, with no real loss of
functionality.

* eb/limit-bulk-checkin-to-blobs:
  bulk-checkin: only support blobs in index_bulk_checkin
2023-10-10 11:39:14 -07:00
9eb5419799 bulk-checkin: only support blobs in index_bulk_checkin
As the code is written today index_bulk_checkin only accepts blobs.
Remove the enum object_type parameter and rename index_bulk_checkin to
index_blob_bulk_checkin, index_stream to index_blob_stream,
deflate_to_pack to deflate_blob_to_pack, stream_to_pack to
stream_blob_to_pack, to make this explicit.

Not supporting commits, tags, or trees has no downside as it is not
currently supported now, and commits, tags, and trees being smaller by
design do not have the problem that the problem that index_bulk_checkin
was built to solve.

Before we start adding code to support the hash function transition
supporting additional objects types in index_bulk_checkin has no real
additional cost, just an extra function parameter to know what the
object type is.  Once we begin the hash function transition this is not
the case.

The hash function transition document specifies that a repository with
compatObjectFormat enabled will compute and store both the SHA-1 and
SHA-256 hash of every object in the repository.

What makes this a challenge is that it is not just an additional hash
over the same object.  Instead the hash function transition document
specifies that the compatibility hash (specified with
compatObjectFormat) be computed over the equivalent object that another
git repository whose storage hash (specified with objectFormat) would
store.  When comparing equivalent repositories built with different
storage hash functions, the oids embedded in objects used to refer to
other objects differ and the location of signatures within objects
differ.

As blob objects have neither oids referring to other objects nor stored
signatures their storage hash and their compatibility hash are computed
over the same object.

The other kinds of objects: trees, commits, and tags, all store oids
referring to other objects.  Signatures are stored in commit and tag
objects.  As oids and the tags to store signatures are not the same size
in repositories built with different storage hashes the size of the
equivalent objects are also different.

A version of index_bulk_checkin that supports more than just blobs when
computing both the SHA-1 and the SHA-256 of every object added would
need a different, and more expensive structure.  The structure is more
expensive because it would be required to temporarily buffering the
equivalent object the compatibility hash needs to be computed over.

A temporary object is needed, because before a hash over an object can
computed it's object header needs to be computed.  One of the members of
the object header is the entire size of the object.  To know the size of
an equivalent object an entire pass over the original object needs to be
made, as trees, commits, and tags are composed of a variable number of
variable sized pieces.  Unfortunately there is no formula to compute the
size of an equivalent object from just the size of the original object.

Avoid all of those future complications by limiting index_bulk_checkin
to only work on blobs.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-26 10:17:56 -07:00
331f20d52d Merge branch 'ew/hash-with-openssl-evp'
Fix-up new-ish code to support OpenSSL EVP API.

* ew/hash-with-openssl-evp:
  treewide: fix various bugs w/ OpenSSL 3+ EVP API
2023-09-13 10:07:57 -07:00
e0b8c84240 treewide: fix various bugs w/ OpenSSL 3+ EVP API
The OpenSSL 3+ EVP API for SHA-* cannot support our prior use cases
supported by other SHA-* implementations.  It has the following
differences:

1. ->init_fn is required before all use
2. struct assignments don't work and requires ->clone_fn
3. can't support ->update_fn after ->final_*fn

While fixing cases 1 and 2 is merely the matter of calling ->init_fn and
->clone_fn as appropriate, fixing case 3 requires calling ->final_*fn on
a temporary context that's cloned from the primary context.

Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/ZPCL11k38PXTkFga@debian.me/
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Fixes: 3e440ea0ab ("sha256: avoid functions deprecated in OpenSSL 3+")
Fixes: bda9c12073 ("avoid SHA-1 functions deprecated in OpenSSL 3+")
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31 22:26:01 -07:00
91c080dff5 git-compat-util: move alloc macros to git-compat-util.h
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:42:31 -07:00
da9502ff4d treewide: remove unnecessary includes for wrapper.h
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:41:59 -07:00
a034e9106f object-store-ll.h: split this header out of object-store.h
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:54 -07:00
65156bb7ec treewide: remove double forward declaration of read_in_full
cache.h's nature of a dumping ground of includes prevented it from
being included in some compat/ files, forcing us into a workaround
of having a double forward declaration of the read_in_full() function
(see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to
fix a warning", 2007-11-17)).  Now that we have moved functions like
read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered
with unrelated and scary #defines, get rid of the extra forward
declaration and just have compat/pread.c include wrapper.h.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:11 -07:00
b6fdc44c84 treewide: remove cache.h inclusion due to object-file.h changes
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
e7dca80692 Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04 08:25:52 -07:00
bc726bd075 cocci: apply the "object-store.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
ec2f026961 csum-file.h: remove unnecessary inclusion of cache.h
With the change in the last commit to move several functions to
write-or-die.h, csum-file.h no longer needs to include cache.h.
However, removing that include forces several other C files, which
directly or indirectly dependend upon csum-file.h's inclusion of
cache.h, to now be more explicit about their dependencies.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:55 -07:00
32a8f51061 environment.h: move declarations for environment.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
41771fa435 cache.h: remove dependence on hex.h; make other files include it explicitly
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
36bf195890 alloc.h: move ALLOC_GROW() functions from cache.h
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
ce50f1f3ac t5351: avoid relying on core.fsyncMethod = batch to be supported
On FreeBSD, this mode is not supported. But since 3a251bac0d (trace2:
only include "fsync" events if we git_fsync(), 2022-07-18) t5351 will
fail if this mode is unsupported.

Let's address this in the minimal fashion, by detecting that that mode
is unsupported and expecting a different count of hardware flushes in
that case.

This fixes the CI/PR builds on FreeBSD again.

Note: A better way would be to test only what is relevant in t5351.6
"unpack big object in stream (core.fsyncmethod=batch)" again instead of
blindly comparing the output against some exact text. But that would
pretty much revert the idea of above-mentioned commit, and that commit
was _just_ accepted into Git's main branch so one must assume that it
was intentional.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-29 09:08:57 -07:00
a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
1c573cdd72 pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
This structure will be used to communicate the per-object mtimes when
writing a cruft pack. Here, we need the full packing_data structure
because the mtime information is stored in an array there, not on the
individual object_entry's themselves (to avoid paying the overhead in
structure width for operations which do not generate a cruft pack).

We haven't passed this information down before because one of the two
callers (in bulk-checkin.c) does not have a packing_data structure at
all. In that case (where no cruft pack will be generated), NULL is
passed instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
c0f4752ed2 core.fsyncmethod: batched disk flushes for loose-objects
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.

When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
   writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the default
   today, but the user now has the option of syncing the index and there
   is a separate patch series to implement syncing of refs.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.

Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.

In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.

In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.

Adding 500 files to the repo with 'git add' Times reported in seconds.

object file syncing | Linux | Mac   | Windows
--------------------|-------|-------|--------
           disabled | 0.06  |  0.35 | 0.61
              fsync | 1.88  | 11.18 | 2.47
              batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:13:01 -07:00
2c23d1b477 bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
Make it clearer in the naming and documentation of the plug_bulk_checkin
and unplug_bulk_checkin APIs that they can be thought of as
a "transaction" to optimize operations on the object database. These
transactions may be nested so that subsystems like the cache-tree
writing code can optimize their operations without caring whether the
top-level code has a transaction active.

Add a flush_odb_transaction API that will be used in update-index to
make objects visible even if a transaction is active. The flush call may
also be useful in future cases if we hold a transaction active around
calling hooks.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:02:09 -07:00
897c9e2575 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
This commit prepares for adding batch-fsync to the bulk-checkin
infrastructure.

The bulk-checkin infrastructure is currently used to batch up addition
of large blobs to a packfile. When a blob is larger than
big_file_threshold, we unconditionally add it to a pack. If bulk
checkins are 'plugged', we allow multiple large blobs to be added to a
single pack until we reach the packfile size limit; otherwise, we simply
make a new packfile for each large blob. The 'unplug' call tells us when
the series of blob additions is done so that we can finish the packfiles
and make their objects available to subsequent operations.

Stated another way, bulk-checkin allows callers to define a transaction
that adds multiple objects to the object database, where the object
database can optimize its internal operations within the transaction
boundary.

Batched fsync will fit into bulk-checkin by taking advantage of the
plug/unplug functionality to determine the appropriate time to fsync
and make newly-added objects available in the primary object database.

* Rename 'state' variable to 'bulk_checkin_packfile', since we will
  later be adding 'bulk_fsync_objdir'. This also makes the variable
  easier to find in the debugger, since the name is more unique.

* Rename finish_bulk_checkin to flush_bulk_checkin_packfile and call it
  unconditionally from unplug_bulk_checkin. Internally it will
  conditionally do a flush if there's any work to do.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

The net effect of these changes is to make a clear separation between
the portion of the bulk-checkin infrastructure that is related to the
packfile (nearly all of it at present) and the part that is related to
other future optimizations of the ODB.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:02:09 -07:00
eb804cd405 Merge branch 'ns/core-fsyncmethod'
Replace core.fsyncObjectFiles with two new configuration variables,
core.fsync and core.fsyncMethod.

* ns/core-fsyncmethod:
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-25 16:38:24 -07:00
020406eaa5 core.fsync: introduce granular fsync control infrastructure
This commit introduces the infrastructure for the core.fsync
configuration knob. The repository components we want to sync
are identified by flags so that we can turn on or off syncing
for specific components.

If core.fsyncObjectFiles is set and the core.fsync configuration
also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any
loose objects. This picks the strictest data integrity behavior
if core.fsync and core.fsyncObjectFiles are set to conflicting values.

This change introduces the currently unused fsync_component
helper, which will be used by a later patch that adds fsyncing to
the refs backend.

Actual configuration and documentation of the fsync components
list are in other patches in the series to separate review of
the underlying mechanism from the policy of how it's configured.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
b04cdea46c object-file API: add a format_object_header() function
Add a convenience function to wrap the xsnprintf() command that
generates loose object headers. This code was copy/pasted in various
parts of the codebase, let's define it in one place and re-use it from
there.

All except one caller of it had a valid "enum object_type" for us,
it's only write_object_file_prepare() which might need to deal with
"git hash-object --literally" and a potential garbage type. Let's have
the primary API use an "enum object_type", and define a *_literally()
function that can take an arbitrary "const char *" for the type.

See [1] for the discussion that prompted this patch, i.e. new code in
object-file.c that wanted to copy/paste the xsnprintf() invocation.

In the case of fast-import.c the callers unfortunately need to cast
back & forth between "unsigned char *" and "char *", since
format_object_header() ad encode_in_pack_object_header() take
different signedness.

1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
2ec02dd5a8 pack-write: split up finish_tmp_packfile() function
Split up the finish_tmp_packfile() function and use the split-up version
in pack-objects.c in preparation for moving the step of renaming the
*.idx file later as part of a function change.

Since the only other caller of finish_tmp_packfile() was in
bulk-checkin.c, and it won't be needing a change to its *.idx renaming,
provide a thin wrapper for the old function as a static function in that
file. If other callers end up needing the simpler version it could be
moved back to "pack-write.c" and "pack.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 18:23:11 -07:00
66833f0e70 pack-write: refactor renaming in finish_tmp_packfile()
Refactor the renaming in finish_tmp_packfile() into a helper function.
The callers are now expected to pass a "name_buffer" ending in
"pack-OID." instead of the previous "pack-", we then append "pack",
"idx" or "rev" to it.

By doing the strbuf_setlen() in rename_tmp_packfile() we reuse the
buffer and avoid the repeated allocations we'd get if that function had
its own temporary "struct strbuf".

This approach of reusing the buffer does make the last user in
pack-object.c's write_pack_file() slightly awkward, since we needlessly
do a strbuf_setlen() before calling strbuf_release() for consistency. In
subsequent changes we'll move that bitmap writing code around, so let's
not skip the strbuf_setlen() now.

The previous strbuf_reset() idiom originated with 5889271114
(finish_tmp_packfile():use strbuf for pathname construction,
2014-03-03), which in turn was a minimal adjustment of pre-strbuf code
added in 0e990530ae (finish_tmp_packfile(): a helper function,
2011-10-28).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 18:23:11 -07:00
ae44b5a4f3 bulk-checkin.c: store checksum directly
finish_bulk_checkin() stores the checksum from finalize_hashfile() by
writing to the `hash` member of `struct object_id`, but that hash has
nothing to do with an object id (it's just a convenient location that
happens to be sized correctly).

Store the hash directly in an unsigned char array. This behaves the same
as writing to the `hash` member, but makes the intent clearer (and
avoids allocating an extra four bytes for the `algo` member of `struct
object_id`). It likewise prevents the possibility of a segfault when
reading `algo` (e.g., by calling `oid_to_hex()`) if it is uninitialized.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 18:23:11 -07:00
f96178c529 bulk-checkin: make buffer reuse more obvious and safer
ibuf can be reused for multiple iterations of the loop. Specifically:
deflate() overwrites s.avail_in to show how much of the input buffer
has not been processed yet - and sometimes leaves 'avail_in > 0', in
which case ibuf will be processed again during the loop's subsequent
iteration.

But if we declare ibuf within the loop, then (in theory) we get a new
(and uninitialised) buffer for every iteration. In practice, my compiler
seems to resue the same buffer - meaning that this code does work - but
it doesn't seem safe to rely on this behaviour. MSAN correctly catches
this issue - as soon as we hit the 's.avail_in > 0' condition, we end up
reading from what seems to be uninitialised memory.

Therefore, we move ibuf out of the loop, making this reuse safe.

See MSAN output from t1050-large below - the interesting part is the
ibuf creation at the end, although there's a lot of indirection before
we reach the read from unitialised memory:

==11294==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7f75db58fb1c in crc32_little crc32.c:283:9
    #1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
    #2 0x7f75db59668c in crc32 crc32.c:242:12
    #3 0x8c94f8 in hashwrite csum-file.c:101:15
    #4 0x825faf in stream_to_pack bulk-checkin.c:154:5
    #5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #7 0xa7cff2 in index_stream object-file.c:2234:9
    #8 0xa7bff7 in index_fd object-file.c:2256:9
    #9 0xa7d22d in index_path object-file.c:2274:7
    #10 0xb3c8c9 in add_to_index read-cache.c:802:7
    #11 0xb3e039 in add_file_to_index read-cache.c:835:9
    #12 0x4a99c3 in add_files add.c:458:7
    #13 0x4a7276 in cmd_add add.c:670:18
    #14 0x4a1e76 in run_builtin git.c:461:11
    #15 0x49e1e7 in handle_builtin git.c:714:3
    #16 0x4a0c08 in run_argv git.c:781:4
    #17 0x49d5a8 in cmd_main git.c:912:19
    #18 0x7974da in main common-main.c:52:11
    #19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #20 0x421bd9 in _start start.S:120

  Uninitialized value was stored to memory at
    #0 0x7f75db58fa6b in crc32_little crc32.c:283:9
    #1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
    #2 0x7f75db59668c in crc32 crc32.c:242:12
    #3 0x8c94f8 in hashwrite csum-file.c:101:15
    #4 0x825faf in stream_to_pack bulk-checkin.c:154:5
    #5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #7 0xa7cff2 in index_stream object-file.c:2234:9
    #8 0xa7bff7 in index_fd object-file.c:2256:9
    #9 0xa7d22d in index_path object-file.c:2274:7
    #10 0xb3c8c9 in add_to_index read-cache.c:802:7
    #11 0xb3e039 in add_file_to_index read-cache.c:835:9
    #12 0x4a99c3 in add_files add.c:458:7
    #13 0x4a7276 in cmd_add add.c:670:18
    #14 0x4a1e76 in run_builtin git.c:461:11
    #15 0x49e1e7 in handle_builtin git.c:714:3
    #16 0x4a0c08 in run_argv git.c:781:4
    #17 0x49d5a8 in cmd_main git.c:912:19
    #18 0x7974da in main common-main.c:52:11
    #19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5c2011 in flush_pending deflate.c:746:5
    #2 0x7f75db5cafa0 in deflate_stored deflate.c:1815:9
    #3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    #4 0xd80b7f in git_deflate zlib.c:244:12
    #5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    #6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #8 0xa7cff2 in index_stream object-file.c:2234:9
    #9 0xa7bff7 in index_fd object-file.c:2256:9
    #10 0xa7d22d in index_path object-file.c:2274:7
    #11 0xb3c8c9 in add_to_index read-cache.c:802:7
    #12 0xb3e039 in add_file_to_index read-cache.c:835:9
    #13 0x4a99c3 in add_files add.c:458:7
    #14 0x4a7276 in cmd_add add.c:670:18
    #15 0x4a1e76 in run_builtin git.c:461:11
    #16 0x49e1e7 in handle_builtin git.c:714:3
    #17 0x4a0c08 in run_argv git.c:781:4
    #18 0x49d5a8 in cmd_main git.c:912:19
    #19 0x7974da in main common-main.c:52:11

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db644241 in _tr_stored_block trees.c:873:5
    #2 0x7f75db5cad7c in deflate_stored deflate.c:1813:9
    #3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    #4 0xd80b7f in git_deflate zlib.c:244:12
    #5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    #6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #8 0xa7cff2 in index_stream object-file.c:2234:9
    #9 0xa7bff7 in index_fd object-file.c:2256:9
    #10 0xa7d22d in index_path object-file.c:2274:7
    #11 0xb3c8c9 in add_to_index read-cache.c:802:7
    #12 0xb3e039 in add_file_to_index read-cache.c:835:9
    #13 0x4a99c3 in add_files add.c:458:7
    #14 0x4a7276 in cmd_add add.c:670:18
    #15 0x4a1e76 in run_builtin git.c:461:11
    #16 0x49e1e7 in handle_builtin git.c:714:3
    #17 0x4a0c08 in run_argv git.c:781:4
    #18 0x49d5a8 in cmd_main git.c:912:19
    #19 0x7974da in main common-main.c:52:11

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5c8fcf in deflate_stored deflate.c:1783:9
    #2 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    #3 0xd80b7f in git_deflate zlib.c:244:12
    #4 0x825dff in stream_to_pack bulk-checkin.c:140:12
    #5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #7 0xa7cff2 in index_stream object-file.c:2234:9
    #8 0xa7bff7 in index_fd object-file.c:2256:9
    #9 0xa7d22d in index_path object-file.c:2274:7
    #10 0xb3c8c9 in add_to_index read-cache.c:802:7
    #11 0xb3e039 in add_file_to_index read-cache.c:835:9
    #12 0x4a99c3 in add_files add.c:458:7
    #13 0x4a7276 in cmd_add add.c:670:18
    #14 0x4a1e76 in run_builtin git.c:461:11
    #15 0x49e1e7 in handle_builtin git.c:714:3
    #16 0x4a0c08 in run_argv git.c:781:4
    #17 0x49d5a8 in cmd_main git.c:912:19
    #18 0x7974da in main common-main.c:52:11
    #19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5ea545 in read_buf deflate.c:1181:5
    #2 0x7f75db5c97f7 in deflate_stored deflate.c:1791:9
    #3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    #4 0xd80b7f in git_deflate zlib.c:244:12
    #5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    #6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    #7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    #8 0xa7cff2 in index_stream object-file.c:2234:9
    #9 0xa7bff7 in index_fd object-file.c:2256:9
    #10 0xa7d22d in index_path object-file.c:2274:7
    #11 0xb3c8c9 in add_to_index read-cache.c:802:7
    #12 0xb3e039 in add_file_to_index read-cache.c:835:9
    #13 0x4a99c3 in add_files add.c:458:7
    #14 0x4a7276 in cmd_add add.c:670:18
    #15 0x4a1e76 in run_builtin git.c:461:11
    #16 0x49e1e7 in handle_builtin git.c:714:3
    #17 0x4a0c08 in run_argv git.c:781:4
    #18 0x49d5a8 in cmd_main git.c:912:19
    #19 0x7974da in main common-main.c:52:11

  Uninitialized value was created by an allocation of 'ibuf' in the stack frame of function 'stream_to_pack'
    #0 0x825710 in stream_to_pack bulk-checkin.c:101

SUMMARY: MemorySanitizer: use-of-uninitialized-value crc32.c:283:9 in crc32_little
Exiting

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-11 13:01:22 +09:00
5951bf467e Use the final_oid_fn to finalize hashing of object IDs
When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary.  To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:38 +09:00