Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:
1. We currently don't bother calling fsck_object() at all
for blobs, since it has traditionally been a noop. We'd
actually catch these blobs in fsck_finish() at the end,
but it's more efficient to check them when we already
have the object loaded in memory.
2. The second pass done by fsck_finish() needs to access
the objects, but we're actually indexing the pack in
this process. In theory we could give the fsck code a
special callback for accessing the in-pack data, but
it's actually quite tricky:
a. We don't have an internal efficient index mapping
oids to packfile offsets. We only generate it on
the fly as part of writing out the .idx file.
b. We'd still have to reconstruct deltas, which means
we'd basically have to replicate all of the
reading logic in packfile.c.
Instead, let's avoid running fsck_finish() until after
we've written out the .idx file, and then just add it
to our internal packed_git list.
This does mean that the objects are "in the repository"
before we finish our fsck checks. But unpack-objects
already exhibits this same behavior, and it's an
acceptable tradeoff here for the same reason: the
quarantine mechanism means that pushes will be
fully protected.
In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.
This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.
Signed-off-by: Jeff King <peff@peff.net>
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.
This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.
This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.
Signed-off-by: Jeff King <peff@peff.net>
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.
With this, we can now catch the malicious case in t7415 with
git-fsck.
Signed-off-by: Jeff King <peff@peff.net>
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:
1. The fsck_object() function just returns success for any
blob. Let's a noop fsck_blob(), which we can fill in
with actual logic later.
2. The fsck_loose() function in builtin/fsck.c
just threw away blob content after loading it. Let's
hold onto it until after we've called fsck_object().
The easiest way to do this is to just drop the
parse_loose_object() helper entirely. Incidentally,
this also fixes a memory leak: if we successfully
loaded the object data but did not parse it, we would
have left the function without freeing it.
3. When fsck_loose() loads the object data, it
does so with a custom read_loose_object() helper. This
function streams any blobs, regardless of size, under
the assumption that we're only checking the sha1.
Instead, let's actually load blobs smaller than
big_file_threshold, as the normal object-reading
code-paths would do. This lets us fsck small files, and
a NULL return is an indication that the blob was so big
that it needed to be streamed, and we can pass that
information along to fsck_blob().
Signed-off-by: Jeff King <peff@peff.net>
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.
Signed-off-by: Jeff King <peff@peff.net>
* jk/submodule-name-verify-fix:
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_path: drop clever fallthrough
skip_prefix: add icase-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
path: match NTFS short names for more .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Note that this includes two bits of evil-merge:
- there's a new call to verify_path() that doesn't actually
have a mode available. It should be OK to pass "0" here,
since we're just manipulating the untracked cache, not an
actual index entry.
- the lstat() in builtin/update-index.c:update_one() needs
to be updated to handle the fsmonitor case (without this
it still behaves correctly, but does an unnecessary
lstat).
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Signed-off-by: Jeff King <peff@peff.net>
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.
This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.
Signed-off-by: Jeff King <peff@peff.net>
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).
Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:
1. What should the allowed syntax be?
It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:
a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a
submodule named "foo/.git" isn't actually
dangerous, and it's possible that somebody has
manually given such a funny name.
b. Since we'll eventually use this checking logic in
fsck to prevent downstream repositories, it should
be consistent across platforms. Because
verify_path() relies on is_dir_sep(), it wouldn't
block "foo\..\bar" on a non-Windows machine.
2. Where should we enforce it? These days most of the
.gitmodules reads go through submodule-config.c, so
I've put it there in the reading step. That should
cover all of the C code.
We also construct the name for "git submodule add"
inside the git-submodule.sh script. This is probably
not a big deal for security since the name is coming
from the user anyway, but it would be polite to remind
them if the name they pick is invalid (and we need to
expose the name-checker to the shell anyway for our
test scripts).
This patch issues a warning when reading .gitmodules
and just ignores the related config entry completely.
This will generally end up producing a sensible error,
as it works the same as a .gitmodules file which is
missing a submodule entry (so "submodule update" will
barf, but "git clone --recurse-submodules" will print
an error but not abort the clone.
There is one minor oddity, which is that we print the
warning once per malformed config key (since that's how
the config subsystem gives us the entries). So in the
new test, for example, the user would see three
warnings. That's OK, since the intent is that this case
should never come up outside of malicious repositories
(and then it might even benefit the user to see the
message multiple times).
Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.
Signed-off-by: Jeff King <peff@peff.net>
Now that we use generation numbers from the commit-graph, we must
ensure that all commits that exist in the commit-graph are loaded
from that file instead of from the object database. Since the
commit-graph file is only checked if core.commitGraph is true, we
must check the default config before we load any commits.
In the merge builtin, the config was checked after loading the HEAD
commit. This was due to the use of the global 'branch' when checking
merge-specific config settings.
Move the config load to be between the initialization of 'branch' and
the commit lookup.
Without this change, a fast-forward merge would hit a BUG("bad
generation skip") statement in commit.c during paint_down_to_common().
This is because the HEAD commit would be loaded with "infinite"
generation but then reached by commits with "finite" generation
numbers.
Add a test to t5318-commit-graph.sh that exercises this code path to
prevent a regression.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add --dissociate option to add and update commands, both clone helper commands
that already have the --reference option --dissociate pairs with.
Signed-off-by: Casey Fitzpatrick <kcghost@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strings allocated in `setup_unpack_trees_porcelain()` are never
freed. Provide a function `clear_unpack_trees_porcelain()` to do so and
call it where we use `setup_unpack_trees_porcelain()`. The only
non-trivial user is `unpack_trees_start()`, where we should place the
new call in `unpack_trees_finish()`.
We keep the string pointers in an array, mixing pointers to static
memory and memory that we allocate on the heap. We also keep several
copies of the individual pointers. So we need to make sure that we do
not free what we must not free and that we do not double-free. Let a
separate argv_array take ownership of all the strings we create so that
we can easily free them.
Zero the whole array of string pointers to make sure that we do not
leave any dangling pointers.
Note that we only take responsibility for the memory allocated in
`setup_unpack_trees_porcelain()` and not any other members of the
`struct unpack_trees_options`.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is another candidate for commit-slab. Keep Junio's observation in
code so we can search it later on when somebody wants to improve the
code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of relying on commit->util to store the source string, let the
user provide a commit-slab to store the source strings in.
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The help command currently hard codes the list of guides and their
summary in C. Let's move this list to command-list.txt. This lets us
extract summary lines from Documentation/git*.txt. This also
potentially lets us list guides in git.txt, but I'll leave that for
now.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This lists all recognized commands [1] by category. The group order
follows closely git.txt.
[1] We may actually show commands that are not built (e.g. if you set
NO_PERL you don't have git-instaweb but it's still listed here). I
ignore the problem because on Linux a git package could be split
anyway. The "git-core" package may not contain git-instaweb even if
it's built because it may end up in a separate package. We can't know
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you run "config --blob" outside of a repository, then we
eventually try to resolve the blob name and hit a BUG().
Let's catch this earlier and provide a useful message.
Note that we could also catch this much lower in the stack,
in git_config_from_blob_ref(). That might cover other
callsites, too, but it's unclear whether those ones would
actually be bugs or not. So let's leave the low-level
functions to assume the caller knows what it's doing (and
BUG() if it turns out it doesn't).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Migrate all git_path_* functions that are defined in path.c to take a
repository argument. Unlike other patches in this series, do not use the
#define trick, as we rewrite the whole function, which is rather small.
This doesn't migrate all the functions, as other builtins have their own
local path functions defined using GIT_PATH_FUNC. So keep that macro
around to serve the other locations.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow callers of register_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach fetch to generate ref-prefixes, to be used for server-side
filtering of the ref-advertisement, based on the configured fetch
refspec ('remote.<name>.fetch') when no user provided refspec exists.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using protocol v2 a client constructs a list of ref-prefixes which
are sent across the wire so that the server can do server-side filtering
of the ref-advertisement. The logic that does this exists for both
fetch and push (even though no push support for v2 currently exists yet)
and is roughly the same so lets consolidate this logic and make it
general enough that it can be used for both the push and fetch cases.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'match_push_refs()' to take a 'struct refspec' as a parameter
instead of an array of 'const char *'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove 'transprot_verify_remote_names()' because all callers have
migrated to using 'struct refspec' which performs the same checks in
'parse_refspec()'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert send-pack.c to store refspecs in a 'struct refspec' instead of
as an array of 'const char *'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'transport_push()' to take a 'struct refspec' as a
parameter instead of an array of strings which represent
refspecs.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the refspecs in builtin/push.c to be stored in a 'struct
refspec' instead of being stored in a list of 'struct refspec_item's.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the error checking for using the "--mirror", "--all", and "--tags"
options earlier and explicitly check for the presence of the flags
instead of checking for a side-effect of the flag.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'query_refspecs()' to take a 'struct refspec' as a parameter instead
of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'apply_refspecs()' to take a 'struct refspec' as a parameter instead
of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'get_stale_heads()' to take a 'struct refspec' as a parameter instead
of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'prune_refs()' to take a 'struct refspec' as a parameter instead
of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'get_ref_map()' to take a 'struct refspec' as a parameter
instead of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'do_fetch()' to take a 'struct refspec' as a parameter instead
of a list of 'struct refspec_item'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the refmap in builtin/fetch.c to be stored in a
'struct refspec'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove 'add_prune_tags_to_fetch_refspec()' function and instead have the
only caller directly add the tag refspec using 'refspec_append()'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the set of fetch refspecs stored in 'struct remote' to use
'struct refspec'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the set of push refspecs stored in 'struct remote' to use
'struct refspec'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert fast-export to use 'struct refspec' instead of using a list of
refspec_item's.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'cmd_clone()' to use 'refspec_item_init()' instead of relying on
the old 'parse_fetch_refspec()' to initialize a single refspec item.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert 'get_tracking_branch()' to use 'refspec_item_init()' instead of
the old 'parse_fetch_refspec()' function.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>