Commit Graph

8708 Commits

Author SHA1 Message Date
34e849b05a Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.

* jt/cdn-offload:
  upload-pack: fix a sparse '0 as NULL pointer' warning
  upload-pack: send part of packfile response as uri
  fetch-pack: support more than one pack lockfile
  upload-pack: refactor reading of pack-objects out
  Documentation: add Packfile URIs design doc
  Documentation: order protocol v2 sections
  http-fetch: support fetching packfiles by URL
  http-fetch: refactor into function
  http: refactor finish_http_pack_request()
  http: use --stdin when indexing dumb HTTP pack
2020-06-25 12:27:47 -07:00
10462829e3 Merge branch 'ss/submodule-set-branch-in-c'
Rewrite of parts of the scripted "git submodule" Porcelain command
continues; this time it is "git submodule set-branch" subcommand's
turn.

* ss/submodule-set-branch-in-c:
  submodule: port subcommand 'set-branch' from shell to C
2020-06-25 12:27:47 -07:00
7b2685ef2d Merge branch 'dl/branch-cleanup'
Code clean-up around "git branch" with a minor bugfix.

* dl/branch-cleanup:
  branch: don't mix --edit-description
  t3200: test for specific errors
  t3200: rename "expected" to "expect"
2020-06-25 12:27:47 -07:00
1457886ce2 Merge branch 'ct/diff-with-merge-base-clarification'
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.

* ct/diff-with-merge-base-clarification:
  Documentation: usage for diff combined commits
  git diff: improve range handling
  t/t3430: avoid undefined git diff behavior
2020-06-25 12:27:46 -07:00
53674699c0 Merge branch 'en/clean-cleanups'
Code clean-up of "git clean" resulted in a fix of recent
performance regression.

* en/clean-cleanups:
  clean: optimize and document cases where we recurse into subdirectories
  clean: consolidate handling of ignored parameters
  dir, clean: avoid disallowed behavior
  dir: fix a few confusing comments
2020-06-25 12:27:45 -07:00
0cc1b475bb clone: use configured default branch name when appropriate
When cloning a repository without any branches, Git chooses a default
branch name for the as-yet unborn branch.

As part of the implicit initialization of the local repository, Git just
learned to respect `init.defaultBranch` to choose a different initial
branch name. We now really want that branch name to be used as a
fall-back.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
8747ebb7cd init: allow setting the default for the initial branch name via the config
We just introduced the command-line option
`--initial-branch=<branch-name>` to allow initializing a new repository
with a different initial branch than the hard-coded one.

To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each and every
`git init` invocation), let's introduce the `init.defaultBranch` config
setting.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Don Goodman-Wilson <don@goodman-wilson.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
32ba12dab2 init: allow specifying the initial branch name for the new repository
There is a growing number of projects and companies desiring to change
the main branch name of their repositories (see e.g.
https://twitter.com/mislav/status/1270388510684598272 for background on
this).

To change that branch name for new repositories, currently the only way
to do that automatically is by copying all of Git's template directory,
then hard-coding the desired default branch name into the `.git/HEAD`
file, and then configuring `init.templateDir` to point to those copied
template files.

To make this process much less cumbersome, let's introduce a new option:
`--initial-branch=<branch-name>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
f0a96e8d4c submodule: fall back to remote's HEAD for missing remote.<name>.branch
When `remote.<name>.branch` is not configured, `git submodule update`
currently falls back to using the branch name `master`. A much better
idea, however, is to use the remote `HEAD`: on all Git servers running
reasonably recent Git versions, the symref `HEAD` points to the main
branch.

Note: t7419 demonstrates that there _might_ be use cases out there that
_expect_ `git submodule update --remote` to update submodules to the
remote `master` branch even if the remote `HEAD` points to another
branch. Arguably, this patch makes the behavior more intuitive, but
there is a slight possibility that this might cause regressions in
obscure setups.

Even so, it should be okay to fix this behavior without anything like a
longer transition period:

- The `git submodule update --remote` command is not really common.

- Current Git's behavior when running this command is outright
  confusing, unless the remote repository's current branch _is_ `master`
  (in which case the proposed behavior matches the old behavior).

- If a user encounters a regression due to the changed behavior, the fix
  is actually trivial: setting `submodule.<name>.branch` to `master`
  will reinstate the old behavior.

Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
d5bf91fde4 fast-export: add a "data" callback parameter to anonymize_str()
The anonymize_str() function takes a generator callback, but there's no
way to pass extra context to it. Let's add the usual "void *data"
parameter to the generator interface and pass it along.

This is mildly annoying for existing callers, all of which pass NULL,
but is necessary to avoid extra globals in some cases we'll add in a
subsequent patch.

While we're touching each of these callbacks, we can further observe
that none of them use the existing orig/len parameters at all. This
makes sense, since the point is for their output to have no discernable
basis in the original (my original version had some notion that we might
use a one-way function to obfuscate the names, but it was never
implemented). So let's drop those extra parameters. If a caller really
wants to do something with them, it can pass a struct through the new
data parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
6416a865da fast-export: move global "idents" anonymize hashmap into function
All of the other anonymization functions keep their static mappings
inside the function to avoid polluting the global namespace. Let's do
the same for "idents", as nobody needs it outside of
anonymize_ident_line().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
55b01456a9 fast-export: use a flex array to store anonymized entries
Now that we're using a separate keydata struct for hash lookups, we have
more flexibility in how we allocate anonymized_entry structs. Let's push
the "orig" key into a flex member within the struct. That should save us
a few bytes of memory per entry (a pointer plus any malloc overhead),
and may make lookups a little faster (since it's one less pointer to
chase in the comparison function).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
a0f65641df fast-export: stop storing lengths in anonymized hashmaps
Now that the anonymize_str() interface is restricted to NUL-terminated
strings, there's no need for us to keep track of the length of each
entry in the hashmap. This simplifies the code and saves a bit of
memory.

Note that we do still need to compare the stored results to partial
strings passed in by the callers. We can do that by using hashmap's
keydata feature to get the ptr/len pair into the comparison function,
and then using strncmp().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
7f40759496 fast-export: tighten anonymize_mem() interface to handle only strings
While the anonymize_mem() interface _can_ store arbitrary byte
sequences, none of the callers uses this feature (as of the previous
commit). We'd like to keep it that way, as we'll be exposing the
string-like nature of the anonymization routines to the user. So let's
tighten up the interface a bit:

  - don't treat "len" as an out-parameter from anonymize_mem(); this
    ensures callers treat the pointer result as a NUL-terminated string

  - likewise, don't treat "len" as an out-parameter from generator
    functions

  - swap out "void *" for "char *" as appropriate to signal that we
    don't handle arbitrary memory

  - rename the function to anonymize_str()

This will also open up some optimization opportunities in a future
patch.

Note that we can't drop the "len" parameter entirely. Some callers do
pass in partial strings (e.g., "foo/bar", len=3) to avoid copying, and
we need to handle those still.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
750bb32589 fast-export: store anonymized oids as hex strings
When fast-export stores anonymized oids, it does so as binary strings.
And while the anonymous mapping storage is binary-clean (at least as of
the previous commit), this will become awkward when we start exposing
more of it to the user. In particular, if we allow a method for
retaining token "foo", then users may want to specify a hex oid as such
a token.

Let's just switch to storing the hex strings. The difference in memory
usage is negligible (especially considering how infrequently we'd
generally store an oid compared to, say, path components).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
b897bf5f37 fast-export: use xmemdupz() for anonymizing oids
Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:

  - for a partial string, we'd store all bytes up to the NUL, even
    though we'd never look at anything past "len". This didn't produce
    wrong behavior, but was wasteful.

  - for a binary oid that doesn't contain a zero byte, we'd copy garbage
    bytes off the end of the array (though as long as nothing complained
    about reading uninitialized bytes, further reads would be limited by
    "len", and we'd produce the correct results)

  - for a binary oid that does contain a zero byte, we'd copy _fewer_
    bytes than intended into the hashmap struct. When we later try to
    look up a value, we'd access uninitialized memory and potentially
    falsely claim that a particular oid is not present.

The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.

And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
c592fd4c83 builtin/diff: fix botched update of usage comment
In the previous commit, an attempt was made to correct the "N=1, M=0"
case. However, the fix was botched and it introduced two half-correct
sections by mistake. Combine these half-correct sections into one fully
correct section.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:39:41 -07:00
c1ea625f72 commit-reach: avoid is_descendant_of() shim
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.

Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:36:53 -07:00
9740ef888e Merge branch 'es/worktree-duplicate-paths'
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.

* es/worktree-duplicate-paths:
  worktree: make "move" refuse to move atop missing registered worktree
  worktree: generalize candidate worktree path validation
  worktree: prune linked worktree referencing main worktree path
  worktree: prune duplicate entries referencing same worktree path
  worktree: make high-level pruning re-usable
  worktree: give "should be pruned?" function more meaningful name
  worktree: factor out repeated string literal
2020-06-22 15:55:03 -07:00
feea6946a5 diff-files: treat "i-t-a" files as "not-in-index"
The `diff-files' command and related commands which call the function
`cmd_diff_files()', consider the "intent-to-add" files as a part of the
index when comparing the work-tree against it. This was previously
addressed in commits [1] and [2] by turning the option
`--ita-invisible-in-index' (introduced in [3]) on by default.

For `diff-files' (and `add -p' as a consequence) to show the i-t-a
files as as new, `ita_invisible_in_index' will be enabled by default
here as well.

[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default,
                2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
                in index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:46:45 -07:00
03f2465bb1 worktree: drop get_worktrees() unused 'flags' argument
get_worktrees() accepts a 'flags' argument, however, there are no
existing flags (the lone flag GWT_SORT_LINKED was recently retired) and
no behavior which can be tweaked. Therefore, drop the 'flags' argument.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:31:15 -07:00
d9c54c2bbf worktree: drop get_worktrees() special-purpose sorting option
Of all the clients of get_worktrees(), only "git worktree list" wants
the list sorted in a very specific way; other clients simply don't care
about the order. Rather than imbuing get_worktrees() with special
knowledge about how various clients -- now and in the future -- may want
the list sorted, drop the sorting capability altogether and make it the
client's responsibility to sort the list if needed.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:30:29 -07:00
586740aa6e builtin/index-pack: add option to specify hash algorithm
git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.  Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work.  Add documentation for this option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
0c9a4f638a pull: plug minor memory leak after using is_descendant_of()
cmd_pull() builds a commit_list to pass a single potential ancestor to
is_descendant_of().  The latter leaves the list intact.  Release the
allocated memory after the call.

Leaking in cmd_*() isn't a big deal, but sets a bad example for other
users of is_descendant_of().

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 12:17:21 -07:00
a9d7689cd4 builtin/diff: update usage comment
A comment in cmd_diff() states that if one tree-ish and no blobs are
provided, (the "N=1, M=0" case), it will provide a diff between the tree
and the cache. This is incorrect because a diff happens between the
tree-ish and the working tree. Remove the `--cached` in the comment so
that the correct behavior is shown. Add a new section describing the
"N=1, M=0, --cached" behavior.

Next, describe the "N=0, M=0, --cached" case, similar to the above since
it is undocumented.

Finally, fix some spacing issues. Add spaces between each section for
consistency and readability. Also, change tabs within the comment into
spaces.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-18 15:01:15 -07:00
a554228ffb Merge branch 'en/sparse-checkout'
The behaviour of "sparse-checkout" in the state "git clone
--no-checkout" left was changed accidentally in 2.27, which has
been corrected.

* en/sparse-checkout:
  sparse-checkout: avoid staging deletions of all files
2020-06-17 21:54:02 -07:00
524caf8035 Merge branch 'js/reflog-anonymize-for-clone-and-fetch'
The reflog entries for "git clone" and "git fetch" did not
anonymize the URL they operated on.

* js/reflog-anonymize-for-clone-and-fetch:
  clone/fetch: anonymize URLs in the reflog
2020-06-17 21:54:01 -07:00
6da43d937c object: drop parsed_object_pool->commit_count
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 14:37:14 -07:00
dc44639904 branch: don't mix --edit-description
`git branch` accepts `--edit-description` in conjunction with other
arguments. However, `--edit-description` is its own mode, similar to
`--set-upstream-to`, which is also made mutually exclusive with other
modes. Prevent `--edit-description` from being mixed with other modes.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 11:12:34 -07:00
7233f17577 clean: optimize and document cases where we recurse into subdirectories
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
    if (remove_directories)
        dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code.  Add some explanations to
the code to make it clearer.

Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean.  With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories.  git-2.26.0 clearly was not doing
that.  I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into.  As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out.  What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.

However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time.  Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report.  Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.

Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
f7f5c6c0ba clean: consolidate handling of ignored parameters
I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c.  Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
351ea1c3cb dir, clean: avoid disallowed behavior
dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values.  However, a command like
   git clean -fx $DIR
would set both values for dir.flags.  I _think_ it happened to work
because:
  * As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
    if DIR_SHOW_IGNORED_TOO is set.
  * As coded, I believe DIR_SHOW_IGNORED would just happen to take
    precedence over DIR_SHOW_IGNORED_TOO in the code as currently
    constructed.
Which is a long way of saying "we just got lucky".

Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
b7e10b2ca2 Documentation: usage for diff combined commits
Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.

While here, add the three-dot notation to the synopsis.

Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 10:53:44 -07:00
8bfcb3a690 git diff: improve range handling
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.

To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense.  As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 10:53:44 -07:00
dd4b732df7 upload-pack: send part of packfile response as uri
Teach upload-pack to send part of its packfile response as URIs.

An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
9da69a6539 fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.

In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.

Implementation notes:

 - builtin/fetch-pack.c normally does not generate .keep files, and thus
   is unaffected by this or future changes. However, it has an
   undocumented "--lock-pack" feature, used by remote-curl.c when
   implementing the "fetch" remote helper command. In keeping with the
   remote helper protocol, only one "lock" line will ever be written;
   the rest will result in warnings to stderr. However, in practice,
   warnings will never be written because the remote-curl.c "fetch" is
   only used for protocol v0/v1 (which will not generate multiple .keep
   files). (Protocol v2 uses the "stateless-connect" command, not the
   "fetch" command.)

 - connected.c has an optimization in that connectivity checks on a ref
   need not be done if the target object is in a pack known to be
   self-contained and connected. If there are multiple packfiles, this
   optimization can no longer be done.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
810382ed37 worktree: make "move" refuse to move atop missing registered worktree
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]
    $ git worktree remove ../bar
    fatal: validation failed, cannot remove working tree:
        '.../bar' does not point back to '.git/worktrees/bar'

Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".

While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
d179af679b worktree: generalize candidate worktree path validation
"git worktree add" checks that the specified path is a valid location
for a new worktree by ensuring that the path does not already exist and
is not already registered to another worktree (a path can be registered
but missing, for instance, if it resides on removable media). Since "git
worktree add" is not the only command which should perform such
validation ("git worktree move" ought to also), generalize the the
validation function for use by other callers, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
916133ef8e worktree: prune linked worktree referencing main worktree path
"git worktree prune" detects when multiple entries are associated with
the same path and prunes the duplicates, however, it does not detect
when a linked worktree points at the path of the main worktree.
Although "git worktree add" disallows creating a new worktree with the
same path as the main worktree, such a case can arise outside the
control of Git even without the user mucking with .git/worktree/<id>/
administrative files. For instance:

    $ git clone foo.git
    $ git -C foo worktree add ../bar
    $ rm -rf bar
    $ mv foo bar
    $ git -C bar worktree list
    .../bar deadfeeb [master]
    .../bar deadfeeb [bar]

Help the user recover from such corruption by extending "git worktree
prune" to also detect when a linked worktree is associated with the path
of the main worktree.

Reported-by: Jonathan Müller <jonathanmueller.dev@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
4a3ce479ce worktree: prune duplicate entries referencing same worktree path
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]

Help users recover from this form of corruption by teaching "git
worktree prune" to detect when multiple worktrees are associated with
the same path.

[1]: A subsequent commit will fix "git worktree move" validation to be
     more strict.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
dd9609a12e worktree: make high-level pruning re-usable
The low-level logic for removing a worktree is well encapsulated in
delete_git_dir(). However, high-level details related to pruning a
worktree -- such as dealing with verbosity and dry-run mode -- are not
encapsulated. Factor out this high-level logic into its own function so
it can be re-used as new worktree corruption detectors are added.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
1b14d40b38 worktree: give "should be pruned?" function more meaningful name
Readers of the name prune_worktree() are likely to expect the function
to actually prune a worktree, however, it only answers the question
"should this worktree be pruned?". Give it a name more reflective of its
true purpose to avoid such confusion.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
63e50b8678 Merge branch 'cb/bisect-helper-parser-fix'
The code to parse "git bisect start" command line was lax in
validating the arguments.

* cb/bisect-helper-parser-fix:
  bisect--helper: avoid segfault with bad syntax in `start --term-*`
2020-06-08 18:06:32 -07:00
b37fd14beb Merge branch 'dl/remote-curl-deadlock-fix'
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects.  The communication has
been updated to make it more robust.

* dl/remote-curl-deadlock-fix:
  stateless-connect: send response end packet
  pkt-line: define PACKET_READ_RESPONSE_END
  remote-curl: error on incomplete packet
  pkt-line: extern packet_length()
  transport: extract common fetch_pack() call
  remote-curl: remove label indentation
  remote-curl: fix typo
2020-06-08 18:06:30 -07:00
ded44afa02 Merge branch 'bc/filter-process'
Code simplification and test coverage enhancement.

* bc/filter-process:
  t2060: add a test for switch with --orphan and --discard-changes
  builtin/checkout: simplify metadata initialization
2020-06-08 18:06:30 -07:00
dc57a9be5e Merge branch 'tb/commit-graph-no-check-oids'
Clean-up the commit-graph codepath.

* tb/commit-graph-no-check-oids:
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  t5318: reorder test below 'graph_read_expect'
  commit-graph.c: simplify 'fill_oids_from_commits'
  builtin/commit-graph.c: dereference tags in builtin
  builtin/commit-graph.c: extract 'read_one_commit()'
  commit-graph.c: peel refs in 'add_ref_to_set'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: extract 'refs_cb_data'
2020-06-08 18:06:27 -07:00
c9b77f2cea worktree: factor out repeated string literal
For each worktree removed by "git worktree prune", it reports the reason
for the removal. All reasons share the common prefix "Removing
worktrees/%s:". As new removal reasons are added, this prefix needs to
be duplicated, which is error-prone and potentially cumbersome.
Therefore, factor out the common prefix.

Although this change seems to increase the "sentence lego quotient", it
should be reasonably safe, as the reason for removal is a distinct
clause, not strictly related to the prefix. Moreover, the "worktrees" in
"Removing worktrees/%s:" is a path literal which ought not be localized,
so by factoring it out, we can more easily avoid exposing that path
fragment to translators.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 13:31:27 -07:00
98564d8059 sparse-checkout: upgrade repository to version 1 when enabling extension
The 'extensions' configuration variable gets special meaning in the new
repository version, so when enabling the extension we should upgrade the
repository to version 1.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
01bbbbd9da fetch: allow adding a filter after initial clone
Retroactively adding a filter can be useful for existing shallow clones as
they allow users to see earlier change histories without downloading all
git objects in a regular --unshallow fetch.

Without this patch, users can make a clone partial by editing the
repository configuration to convert the remote into a promisor, like:

  git config core.repositoryFormatVersion 1
  git config extensions.partialClone origin
  git fetch --unshallow --filter=blob:none origin

Since the hard part of making this work is already in place and such
edits can be error-prone, teach Git to perform the required configuration
change automatically instead.

Note that this change does not modify the existing git behavior which
recognizes setting extensions.partialClone without changing
repositoryFormatVersion.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
b5bfc08a97 sparse-checkout: avoid staging deletions of all files
sparse-checkout's purpose is to update the working tree to have it
reflect a subset of the tracked files.  As such, it shouldn't be
switching branches, making commits, downloading or uploading data, or
staging or unstaging changes.  Other than updating the worktree, the
only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
index.  In particular, this sets up a nice invariant: running
sparse-checkout will never change the status of any file in `git status`
(reflecting the fact that we only set the SKIP_WORKTREE bit if the file
is safe to delete, i.e. if the file is unmodified).

Traditionally, we did a _really_ bad job with this goal.  The
predecessor to sparse-checkout involved manual editing of
.git/info/sparse-checkout and running `git read-tree -mu HEAD`.  That
command would stage and unstage changes and overwrite dirty changes in
the working tree.

The initial implementation of the sparse-checkout command was no better;
it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
same caveats, though this issue came up repeatedly in review comments
and workarounds for the problems were put in place before the feature
was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].

[1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/
[6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/

However, these workarounds, in addition to disabling the feature in a
number of important cases, also missed one special case.  I'll get back
to it later.

In the 2.27.0 cycle, the disabling of the feature was lifted by finally
replacing the internal equivalent of `git read-tree -mu HEAD` with
something that did what we wanted: the new update_sparsity() function in
unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
and updates the working tree to match.  This new function handles all
the cases that were problematic for the old implementation, except that
it breaks the same special case that avoided the workarounds of the old
implementation, but broke it in a different way.

So...that brings us to the special case: a git clone performed with
--no-checkout.  As per the meaning of the flag, --no-checkout does not
check out any branch, with the implication that you aren't on one and
need to switch to one after the clone.  Implementationally, HEAD is
still set (so in some sense you are partially on a branch), but
  * the index is "unborn" (non-existent)
  * there are no files in the working tree (other than .git/)
  * the next time git switch (or git checkout) is run it will run
    unpack_trees with `initial_checkout` flag set to true.
It is not until you run, e.g. `git switch <somebranch>` that the index
will be written and files in the working tree populated.

With this special --no-checkout case, the traditional `read-tree -mu
HEAD` behavior would have done the equivalent of acting like checkout --
switch to the default branch (HEAD), write out an index that matches
HEAD, and update the working tree to match.  This special case slipped
through the avoid-making-changes checks in the original sparse-checkout
command and thus continued there.

After update_sparsity() was introduced and used (see commit f56f31af03
("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
the behavior for the --no-checkout case changed:  Due to git's
auto-vivification of an empty in-memory index (see do_read_index() and
note that `must_exist` is false), and due to sparse-checkout's
update_working_directory() code to always write out the index after it
was done, we got a new bug.  That made it so that sparse-checkout would
switch the repository from a clone with an "unborn" index (i.e. still
needing an initial_checkout), to one that had a recorded index with no
entries.  Thus, instead of all the files appearing deleted in `git
status` being known to git as a special artifact of not yet being on a
branch, our recording of an empty index made it suddenly look to git as
though it was definitely on a branch with ALL files staged for deletion!
A subsequent checkout or switch then had to contend with the fact that
it wasn't on an initial_checkout but had a bunch of staged deletions.

Make sure that sparse-checkout changes nothing in the index other than
the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
have any branch checked out so there is no sparsification or
de-sparsification work to do.  Simply return from
update_working_directory() early.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 08:05:50 -07:00