From 168a937bbcf74370267fe04f5368035948745baa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Thu, 11 Nov 2021 06:18:55 +0100 Subject: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid object type-path combinations, 2021-10-01). When fsck-ing blobs larger than core.bigFileThreshold, we'd free() a pointer to uninitialized memory. This issue would have been caught by SANITIZE=address, but since it involves core.bigFileThreshold, none of the existing tests in our test suite covered it. Running them with the "big_file_threshold" in "environment.c" changed to say "6" would have shown this failure, but let's add a dedicated test for this scenario based on Han Xin's report[1]. The bug was introduced between v9 and v10[2] of the fsck series merged in 061a21d36d8 (Merge branch 'ab/fsck-unexpected-type', 2021-10-25). 1. https://lore.kernel.org/git/20211111030302.75694-1-hanxin.hx@alibaba-inc.com/ 2. https://lore.kernel.org/git/cover-v10-00.17-00000000000-20211001T091051Z-avarab@gmail.com/ Reported-by: Han Xin Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- object-file.c | 2 ++ t/t1050-large.sh | 8 ++++++++ 2 files changed, 10 insertions(+) diff --git a/object-file.c b/object-file.c index 4c258703a0..9213a51721 100644 --- a/object-file.c +++ b/object-file.c @@ -2533,6 +2533,8 @@ int read_loose_object(const char *path, char hdr[MAX_HEADER_LEN]; unsigned long *size = oi->sizep; + *contents = NULL; + map = map_loose_object_1(the_repository, path, NULL, &mapsize); if (!map) { error_errno(_("unable to mmap %s"), path); diff --git a/t/t1050-large.sh b/t/t1050-large.sh index 4bab6a513c..6bc1d76fb1 100755 --- a/t/t1050-large.sh +++ b/t/t1050-large.sh @@ -17,6 +17,14 @@ test_expect_success setup ' export GIT_ALLOC_LIMIT ' +test_expect_success 'enter "large" codepath, with small core.bigFileThreshold' ' + test_when_finished "rm -rf repo" && + + git init --bare repo && + echo large | git -C repo hash-object -w --stdin && + git -C repo -c core.bigfilethreshold=4 fsck +' + # add a large file with different settings while read expect config do From 16235e3b1460ddd708995b4cc5028e49d47e3761 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Thu, 11 Nov 2021 06:18:56 +0100 Subject: [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the preceding commit a free() of uninitialized memory regression in 96e41f58fe1 (fsck: report invalid object type-path combinations, 2021-10-01) was fixed, but we'd still have an issue with leaking memory from fsck_loose(). Let's fix that issue too. That issue was introduced in my 31deb28f5e0 (fsck: don't hard die on invalid object types, 2021-10-01). It can be reproduced under SANITIZE=leak with the test I added in 093fffdfbec (fsck tests: add test for fsck-ing an unknown type, 2021-10-01): ./t1450-fsck.sh --run=84 -vixd In some sense it's not a problem, we lost the same amount of memory in terms of things malloc'd and not free'd. It just moved from the "still reachable" to "definitely lost" column in valgrind(1) nomenclature[1], since we'd have die()'d before. But now that we don't hard die() anymore in the library let's properly free() it. Doing so makes this code much easier to follow, since we'll now have one function owning the freeing of the "contents" variable, not two. For context on that memory management pattern the read_loose_object() function was added in f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13) and subsequently used in c68b489e564 (fsck: parse loose object paths directly, 2017-01-13). The pattern of it being the task of both sides to free() the memory has been there in this form since its inception. 1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano --- builtin/fsck.c | 3 ++- object-file.c | 7 ++----- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 1a023914a7..97c7322c67 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -604,7 +604,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) struct object *obj; enum object_type type = OBJ_NONE; unsigned long size; - void *contents; + void *contents = NULL; int eaten; struct object_info oi = OBJECT_INFO_INIT; struct object_id real_oid = *null_oid(); @@ -629,6 +629,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) path); if (err < 0) { errors_found |= ERROR_OBJECT; + free(contents); return 0; /* keep checking other objects */ } diff --git a/object-file.c b/object-file.c index 9213a51721..ac3b7a61a1 100644 --- a/object-file.c +++ b/object-file.c @@ -2533,8 +2533,6 @@ int read_loose_object(const char *path, char hdr[MAX_HEADER_LEN]; unsigned long *size = oi->sizep; - *contents = NULL; - map = map_loose_object_1(the_repository, path, NULL, &mapsize); if (!map) { error_errno(_("unable to mmap %s"), path); @@ -2564,10 +2562,9 @@ int read_loose_object(const char *path, goto out; } if (check_object_signature(the_repository, expected_oid, - *contents, *size, oi->type_name->buf, real_oid)) { - free(*contents); + *contents, *size, + oi->type_name->buf, real_oid)) goto out; - } } ret = 0; /* everything checks out */