Commit Graph

59 Commits

Author SHA1 Message Date
6269f8eaad treewide: always have a valid "index_state.repo" member
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.

Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".

This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".

We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.

Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().

For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).

This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".

I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.

1. 1fd9ae517c (repository: add repo reference to index_state,
   2021-01-23)
2. ee1f0c242e (read-cache: add index.skipHash config option,
   2023-01-06)
3. 2f6b1eb794 (cache API: add a "INDEX_STATE_INIT" macro/function,
   add release_index(), 2023-01-12)
4. 1e0ea5c431 (fsmonitor: config settings are repository-specific,
   2022-03-25)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17 14:32:06 -08:00
2f6b1eb794 cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()
Hopefully in some not so distant future, we'll get advantages from always
initializing the "repo" member of the "struct index_state". To make
that easier let's introduce an initialization macro & function.

The various ad-hoc initialization of the structure can then be changed
over to it, and we can remove the various "0" assignments in
discard_index() in favor of calling index_state_init() at the end.

While not strictly necessary, let's also change the CALLOC_ARRAY() of
various "struct index_state *" to use an ALLOC_ARRAY() followed by
index_state_init() instead.

We're then adding the release_index() function and converting some
callers (including some of these allocations) over to it if they
either won't need to use their "struct index_state" again, or are just
about to call index_state_init().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-16 10:46:58 -08:00
666f53eb43 {builtin/*,repository}.c: add & use "USE_THE_INDEX_VARIABLE"
Split up the "USE_THE_INDEX_COMPATIBILITY_MACROS" into that setting
and a more narrow "USE_THE_INDEX_VARIABLE". In the case of these
built-ins we only need "the_index" variable, but not the compatibility
wrapper for functions we're not using.

Let's then have some users of "USE_THE_INDEX_COMPATIBILITY_MACROS" use
this more narrow and descriptive define.

For context: The USE_THE_INDEX_COMPATIBILITY_MACROS macro was added to
test-tool.h in f8adbec9fe (cache.h: flip
NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 12:06:15 +09:00
ccafbbfb4e Merge branch 'ab/plug-random-leaks'
Plug random memory leaks.

* ab/plug-random-leaks:
  repository.c: free the "path cache" in repo_clear()
  range-diff: plug memory leak in read_patches()
  range-diff: plug memory leak in common invocation
  lockfile API users: simplify and don't leak "path"
  commit-graph: stop fill_oids_from_packs() progress on error and free()
  commit-graph: fix memory leak in misused string_list API
  submodule--helper: fix trivial leak in module_add()
  transport: stop needlessly copying bundle header references
  bundle: call strvec_clear() on allocated strvec
  remote-curl.c: free memory in cmd_main()
  urlmatch.c: add and use a *_release() function
  diff.c: free "buf" in diff_words_flush()
  merge-base: free() allocated "struct commit **" list
  index-pack: fix memory leaks
2022-03-13 22:56:18 +00:00
759f340738 repository.c: free the "path cache" in repo_clear()
The "struct path_cache" added in 102de880d2 (path.c: migrate global
git_path_* to take a repository argument, 2018-05-17) is only used
directly by code in repository.[ch] (but populated in path.[ch]).

Let's move this code to repository.[ch], and stop leaking this memory
when we run repo_clear(). To avoid the cast change it from a "const
char *" to a "char *".

This also removes the "PATH_CACHE_INIT" macro, which has never been
used for anything. For the "struct repository" we already make a hard
assumption that it (and "the_repository") can be identically
initialized by making it a "static" variable, so making use of a
"PATH_CACHE_INIT" somewhere would have been confusing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-04 13:24:19 -08:00
af6a51875a repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
The fix is short (~30 lines), but the description is not.  Sorry.

There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts.  But first, we need to understand the
problems this class presents.  A quick outline:

   * Problems
     * User facing issues
     * Problem space complexity
     * Maintenance and code correctness challenges
   * SKIP_WORKTREE expectations in Git
   * Suggested solution
   * Pros/Cons of suggested solution
   * Notes on testcase modifications

=== User facing issues ===

There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index.  This may come from:
  * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps even cached in their editor)
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].

Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.

Further:
  * these files will degrade performance for the sparse-index case due
    to requiring the index to be expanded (see commit 55dfcf9591
    ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why
    we try to delete entire directories outside the sparse cone).
  * these files will not be updated by by standard commands
    (switch/checkout/pull/merge/rebase will leave them alone unless
    conflicts happen -- and even then, the conflicted file may be
    written somewhere else to avoid overwriting the SKIP_WORKTREE file
    that is present and in the way)
  * there is nothing in Git that users can use to discover such
    files (status, diff, grep, etc. all ignore it)
  * there is no reasonable mechanism to "recover" from such a condition
    (neither `git sparse-checkout reapply` nor `git reset --hard` will
    correct it).

So, not only are users modifications ignored, but the files get
progressively more stale over time.  At some point in the future, they
may change their sparseness specification or disable sparse-checkouts.
At that time, all present-despite-SKIP_WORKTREE files will show up as
having lots of modifications because they represent a version from a
different branch or commit.  These might include user-made local changes
from days before, but the only way to tell is to have users look through
them all closely.

If these users come to others for help, there will be no logs that
explain the issue; it's just a mysterious list of changes.  Users might
adamantly claim (correctly, as it turns out) that they didn't modify
these files, while others presume they did.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/

=== Problem space complexity ===

SKIP_WORKTREE has been part of Git for over a decade.  Duy did lots of
work on it initially, and several others have since come along and put
lots of work into it.  Stolee spent most of 2021 on the sparse-index,
with lots of bugfixes along the way including to non-sparse-index cases
as we are still trying to get sparse checkouts to behave reasonably.
Basically every codepath throughout the treat needs to be aware of an
additional type of file: tracked-but-not-present.  The extra type
results in lots of extra testcases and lots of extra code everywhere.

But, the sad thing is that we actually have more than one extra type.
We have tracked, tracked-but-not-present (SKIP_WORKTREE), and
tracked-but-promised-to-not-be-present-but-is-present-anyway
(present-despite-SKIP_WORKTREE).  Two types is a monumental amount of
effort to support, and adding a third feels a bit like insanity[4].

[4] Some examples of which can be seen at
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

=== Maintenance and code correctness challenges ===

Matheus' patches to grep stalled for nearly a year, in part because of
complications of how to handle sparse-checkouts appropriately in all
cases[5][6] (with trying to sanely figure out how to sanely handle
present-despite-SKIP_WORKTREE files being one of the complications).
His rm/add follow-ups also took months because of those kinds of
issues[7].  The corner cases with things like submodules and
SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start
becoming really complex[8].

We've had to add ugly logic to merge-ort to attempt to handle
present-despite-SKIP_WORKTREE files[9], and basically just been forced
to give up in merge-recursive knowing full well that we'll sometimes
silently discard user modifications.  Despite stash essentially being a
merge, it needed extra code (beyond what was in merge-ort and
merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid
a few different bugs that'd result in an early abort with a partial
stash application[10].

[5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t
    and the dates on the thread; also Matheus and I had several
    conversations off-list trying to resolve the issues over that time
[6] ...it finally kind of got unstuck after
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[7] See for example
    https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t
    and quotes like "The core functionality of sparse-checkout has always
    been only partially implemented", a statement I still believe is true
    today.
[8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE
    handling with conflicted entries", 2021-03-20)
[10] See commit ba359fd507 ("stash: fix stash application in
     sparse-checkouts", 2020-12-01)

=== SKIP_WORKTREE expectations in Git ===

A couple quotes:

 * From [11] (before the "sparse-checkout" command existed):

   If it needs too many special cases, hacks, and conditionals, then it
   is not worth the complexity---if it is easier to write a correct code
   by allowing Git to populate working tree files, it is perfectly fine
   to do so.

   In a sense, the sparse checkout "feature" itself is a hack by itself,
   and that is why I think this part should be "best effort" as well.

 * From the git-sparse-checkout manual (still present today):

   THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
   COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
   THE FUTURE.

[11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/

=== Suggested solution ===

SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as
the name of the option implies, to allow the file to NOT be in the
worktree but consider it to be unchanged rather than deleted.

The suggests a simple solution: present-despite-SKIP_WORKTREE files
should not exist, for those using sparse-checkouts.

Enforce this at index loading time by checking if core.sparseCheckout is
true; if so, check files in the index with the SKIP_WORKTREE bit set to
verify that they are absent from the working tree.  If they are present,
unset the bit (in memory, though any commands that write to the index
will record the update).

Users can, of course, can get the SKIP_WORKTREE bit back such as by
running `git sparse-checkout reapply` (if they have ensured the file is
unmodified and doesn't match the specified sparsity patterns).

=== Pros/Cons of suggested solution ===

Pros:

  * Solves the user visible problems reported above, which I've been
    complaining about for nearly a year but couldn't find a solution to.
  * Helps prevent slow performance degradation with a sparse-index.
  * Much easier behavior in sparse-checkouts for users to reason about
  * Very simple, ~30 lines of code.
  * Significantly simplifies some ugly testcases, and obviates the need
    to test an entire class of potential issues.
  * Reduces code complexity, reasoning, and maintenance.  Avoids
    disagreements about weird corner cases[12].
  * It has been reported that some users might be (ab)using
    SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree
    mechanism[13, and a few other similar references].  These users know
    of multiple caveats and shortcomings in doing so; perhaps not
    surprising given the "SKIP_WORKTREE expecations" section above.
    However, these users use `git update-index --skip-worktree`, and not
    `git sparse-checkout` or core.sparseCheckout=true.  As such, these
    users would be unaffected by this change and can continue abusing
    the system as before.

[12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree

Cons:

  * When core.sparseCheckout is enabled, this adds a performance cost to
    reading the index.  I'll defer discussion of this cost to a subsequent
    patch, since I have some optimizations to add.

=== Notes on testcase modifications ===

The good:
  * t1011: Compare to two cases above it ('read-tree will not throw away
    dirty changes, non-sparse'); since the file is present, it should
    match the non-sparse case now
  * t1092: sparse-index & sparse-checkout now match full-worktree
    behavior in more cases!  Yaay for consistency!
  * t6428, t7012: look at how much simpler the tests become!  Merge and
    stash can just fail early telling the user there's a file in the
    way, instead of not noticing until it's about to write a file and
    then have to implement sudden crash avoidance.  Hurray for sanity!
  * t7817: sparse behavior better matches full tree behavior.  Hurray
    for sanity!

The confusing:
  * t3705: These changes were ONLY needed on Windows, but they don't
    hurt other platforms.  Let's discuss each individually:

    * core.sparseCheckout should be false by default.  Nothing in this
      testcase toggles that until many, many tests later.  However,
      early tests (#5 in particular) were testing `update-index
      --skip-worktree` behavior in a non-sparse-checkout, but the
      Windows tests in CI were behaving as if core.sparseCheckout=true
      had been specified somewhere.  I do not have access to a Windows
      machine.  But I just manually did what should have been a no-op
      and turned the config off.  And it fixed the test.
    * I have no idea why the leftover .gitattributes file from this
      test was causing failures for test #18 on Windows, but only with
      these changes of mine.  Test #18 was checking for empty stderr,
      and specifically wanted to know that some error completely
      unrelated to file endings did not appear.  The leftover
      .gitattributes file thus caused some spurious stderr unrelated to
      the thing being checked.  Since other tests did not intend to
      test normalization, just proactively remove the .gitattributes
      file.  I'm certain this is cleaner and better, I'm just unsure
      why/how this didn't trigger problems before.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 14:43:22 -08:00
0dc90d954d Merge branch 'ns/tmp-objdir'
New interface into the tmp-objdir API to help in-core use of the
quarantine feature.

* ns/tmp-objdir:
  tmp-objdir: disable ref updates when replacing the primary odb
  tmp-objdir: new API for creating temporary writable databases
2022-01-03 16:24:15 -08:00
ecd81dfc79 tmp-objdir: disable ref updates when replacing the primary odb
When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.

Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb. Now both mechanisms set
the disable_ref_updates flag on the odb, which is queried by
the ref_transaction_prepare function.

Peff's test case [1] was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden. See the call to notes_cache_put in
fill_textconv where errors are ignored.

[1] https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-08 14:06:46 -08:00
fd3cb0501e remote: move static variables into per-repository struct
remote.c does not works with non-the_repository because it stores its
state as static variables. To support non-the_repository, we can use a
per-repository struct for the remotes subsystem.

Prepare for this change by defining a struct remote_state that holds
the remotes subsystem state and move the static variables of remote.c
into the_repository->remote_state.

This introduces no behavioral or API changes.

Signed-off-by: Glen Choo <chooglen@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 22:31:19 -08:00
921c795c25 Merge branch 'jt/add-submodule-odb-clean-up'
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.

* jt/add-submodule-odb-clean-up:
  revision: remove "submodule" from opt struct
  repository: support unabsorbed in repo_submodule_init
  submodule: remove unnecessary unabsorbed fallback
2021-10-06 13:40:11 -07:00
ce125d431a submodule: extract path to submodule gitdir func
We currently store each submodule gitdir in ".git/modules/<name>", but
this has problems with some submodule naming schemes, as described in a
comment in submodule_name_to_gitdir() in this patch.

Extract the determination of the location of a submodule's gitdir into
its own function submodule_name_to_gitdir(). For now, the problem
remains unsolved, but this puts us in a better position for finding a
solution.

This was motivated, at $DAYJOB, by a part of Android's repo hierarchy
[1]. In particular, there is a repo "build", and several repos of the
form "build/<name>".

This is based on earlier work by Brandon Williams [2].

[1] https://android.googlesource.com/platform/
[2] https://lore.kernel.org/git/20180808223323.79989-2-bmwill@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-15 12:59:12 -07:00
8eb8dcf946 repository: support unabsorbed in repo_submodule_init
In preparation for a subsequent commit that migrates code using
add_submodule_odb() to repo_submodule_init(), teach
repo_submodule_init() to support submodules with unabsorbed gitdirs.
(See the documentation for "git submodule absorbgitdirs" for more
information about absorbed and unabsorbed gitdirs.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
ef7dc2e9cc promisor-remote: support per-repository config
Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.

The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28 09:57:42 -07:00
ebaf3bcf1a repository: move global r_f_p_c to repo struct
Move repository_format_partial_clone, which is currently a global
variable, into struct repository. (Full support for per-repository
partial clone config will be done in a subsequent commit - this is split
into its own commit because of the extent of the changes needed.)

The new repo-specific variable cannot be set in
check_repository_format_gently() (as is currently), because that
function does not know which repo it is operating on (or even whether
the value is important); therefore this responsibility is delegated to
the outermost caller that knows. Of all the outermost callers that know
(found by looking at all functions that call clear_repository_format()),
I looked at those that either read from the main Git directory or write
into a struct repository. These callers have been modified accordingly
(write to the_repository in the former case and write to the given
struct repository in the latter case).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28 09:57:41 -07:00
8e97852919 Merge branch 'ds/sparse-index-protections'
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.

* ds/sparse-index-protections: (47 commits)
  name-hash: use expand_to_path()
  sparse-index: expand_to_path()
  name-hash: don't add directories to name_hash
  revision: ensure full index
  resolve-undo: ensure full index
  read-cache: ensure full index
  pathspec: ensure full index
  merge-recursive: ensure full index
  entry: ensure full index
  dir: ensure full index
  update-index: ensure full index
  stash: ensure full index
  rm: ensure full index
  merge-index: ensure full index
  ls-files: ensure full index
  grep: ensure full index
  fsck: ensure full index
  difftool: ensure full index
  commit: ensure full index
  checkout: ensure full index
  ...
2021-04-30 13:50:26 +09:00
3964fc2aae sparse-index: add guard to ensure full index
Upcoming changes will introduce modifications to the index format that
allow sparse directories. It will be useful to have a mechanism for
converting those sparse index files into full indexes by walking the
tree at those sparse directories. Name this method ensure_full_index()
as it will guarantee that the index is fully expanded.

This method is not implemented yet, and instead we focus on the
scaffolding to declare it and call it at the appropriate time.

Add a 'command_requires_full_index' member to struct repo_settings. This
will be an indicator that we need the index in full mode to do certain
index operations. This starts as being true for every command, then we
will set it to false as some commands integrate with sparse indexes.

If 'command_requires_full_index' is true, then we will immediately
expand a sparse index to a full one upon reading from disk. This
suffices for now, but we will want to add more callers to
ensure_full_index() later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:45 -07:00
ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13 16:00:09 -08:00
1fd9ae517c repository: add repo reference to index_state
It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.

Add a BUG() statement if the repo already has an index, and the index
already has a repo, but somehow the index points to a different repo.

This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23 17:14:07 -08:00
eff45daab8 repository: enable SHA-256 support by default
Now that we have a complete SHA-256 implementation in Git, let's enable
it so people can use it.  Remove the ENABLE_SHA256 define constant
everywhere it's used.  Add tests for initializing a repository with
SHA-256.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:49 -07:00
9412759925 repository: require a build flag to use SHA-256
At this point, SHA-256 support is experimental and some behavior may
change. To avoid surprising unsuspecting users, require a build flag,
ENABLE_SHA256, to allow use of a non-SHA-1 algorithm. Enable this flag
by default when the DEVELOPER make flag is set so that contributors can
test this case adequately.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:21 -08:00
15beaaa3d1 Fix spelling errors in code comments
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10 16:00:54 +09:00
6b5688b760 Merge branch 'ma/clear-repository-format'
The setup code has been cleaned up to avoid leaks around the
repository_format structure.

* ma/clear-repository-format:
  setup: fix memory leaks with `struct repository_format`
  setup: free old value before setting `work_tree`
2019-03-20 15:16:07 +09:00
e8805af1c3 setup: fix memory leaks with struct repository_format
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.

Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.

Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)

We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-01 08:52:00 +09:00
ee4512ed48 trace2: create new combined trace facility
Create a new unified tracing facility for git.  The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.

In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.

Trace2 defines 3 output targets.  These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT".  These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).

* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
  summary data.

* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
  It extends the output with columns for the command process, thread,
  repo, absolute and relative elapsed times.  It reports events for
  child process start/stop, thread start/stop, and per-thread function
  nesting.

* GIT_TR2_EVENT is a new structured format. It writes event data as a
  series of JSON records.

Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 15:27:59 -08:00
7589e63648 Merge branch 'nd/the-index-final'
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.

* nd/the-index-final:
  cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
  read-cache.c: remove the_* from index_has_changes()
  merge-recursive.c: remove implicit dependency on the_repository
  merge-recursive.c: remove implicit dependency on the_index
  sha1-name.c: remove implicit dependency on the_index
  read-cache.c: replace update_index_if_able with repo_&
  read-cache.c: kill read_index()
  checkout: avoid the_index when possible
  repository.c: replace hold_locked_index() with repo_hold_locked_index()
  notes-utils.c: remove the_repository references
  grep: use grep_opt->repo instead of explict repo argument
2019-02-06 22:05:23 -08:00
5d3635db19 Merge branch 'sb/submodule-recursive-fetch-gets-the-tip'
"git fetch --recurse-submodules" may not fetch the necessary commit
that is bound to the superproject, which is getting corrected.

* sb/submodule-recursive-fetch-gets-the-tip:
  fetch: ensure submodule objects fetched
  submodule.c: fetch in submodules git directory instead of in worktree
  submodule: migrate get_next_submodule to use repository structs
  repository: repo_submodule_init to take a submodule struct
  submodule: store OIDs in changed_submodule_names
  submodule.c: tighten scope of changed_submodule_names struct
  submodule.c: sort changed_submodule_names before searching it
  submodule.c: fix indentation
  sha1-array: provide oid_array_filter
2019-01-29 12:47:54 -08:00
f8adbec9fe cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
By default, index compat macros are off from now on, because they
could hide the_index dependency.

Only those in builtin can use it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24 11:55:06 -08:00
3a95f31d1c repository.c: replace hold_locked_index() with repo_hold_locked_index()
hold_locked_index() assumes the index path at $GIT_DIR/index. This is
not good for places that take an arbitrary index_state instead of
the_index, which is basically everywhere except builtin/.

Replace it with repo_hold_locked_index(). hold_locked_index() remains
as a wrapper around repo_hold_locked_index() to reduce changes in builtin/

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14 12:13:04 -08:00
d5498e0871 repository: repo_submodule_init to take a submodule struct
When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
situation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.
The submodule struct can be obtained via submodule_from_{path, name} or
an artificial submodule struct can be passed in.

While we are at it, rename the repository struct in the repo_submodule_init
function, which is to be initialized, to a name that is not confused with
the struct submodule as easily. Perform such renames in similar functions
as well.

Also move its documentation into the header file.

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-05 11:42:32 +09:00
f0eaf63819 sha1-file: use an object_directory for the main object dir
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:

  do_something(r->objects->objdir);

  for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
        do_something(odb->path);

That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).

Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).

A few observations:

  - we don't need r->objects->objectdir anymore, and can just
    mechanically convert that to r->objects->odb->path

  - object_directory's path field needs to become a real pointer rather
    than a FLEX_ARRAY, in order to fill it with expand_base_dir()

  - we'll call prepare_alt_odb() earlier in many functions (i.e.,
    outside of the loop). This may result in us calling it even when our
    function would be satisfied looking only at the main odb.

    But this doesn't matter in practice. It's not a very expensive
    operation in the first place, and in the majority of cases it will
    be a noop. We call it already (and cache its results) in
    prepare_packed_git(), and we'll generally check packs before loose
    objects. So essentially every program is going to call it
    immediately once per program.

    Arguably we should just prepare_alt_odb() immediately upon setting
    up the repository's object directory, which would save us sprinkling
    calls throughout the code base (and forgetting to do so has been a
    source of subtle bugs in the past). But I've stopped short of that
    here, since there are already a lot of other moving parts in this
    patch.

  - Most call sites just get shorter. The check_and_freshen() functions
    are an exception, because they have entry points to handle local and
    nonlocal directories separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13 14:22:03 +09:00
110240588d Merge branch 'sb/object-store-alloc'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-alloc:
  alloc: allow arbitrary repositories for alloc functions
  object: allow create_object to handle arbitrary repositories
  object: allow grow_object_hash to handle arbitrary repositories
  alloc: add repository argument to alloc_commit_index
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_blob_node
  object: add repository argument to grow_object_hash
  object: add repository argument to create_object
  repository: introduce parsed objects field
2018-06-25 13:22:38 -07:00
0abb962893 Merge branch 'nd/repo-clear-keep-the-index'
the_repository->index is not a allocated piece of memory but
repo_clear() indiscriminately attempted to free(3) it, which has
been corrected.

* nd/repo-clear-keep-the-index:
  repository: fix free problem with repo_clear(the_repository)
2018-05-30 14:04:05 +09:00
74373b5f10 repository: fix free problem with repo_clear(the_repository)
the_repository is special. One of the special things about it is that
it does not allocate a new index_state object like submodules but
points to the global the_index variable instead. As a global variable,
the_index cannot be free()'d.

Add an exception for this in repo_clear(). In the future perhaps we
would be able to allocate the_repository's index on heap too. Then we
can revert this.

the_repository->index remains pointed to a clean the_index even after
repo_clear() so that it could still be used next time (e.g. in a crazy
use case where a dev switches repo in the same process).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 18:33:09 +09:00
99bf115c87 repository: introduce parsed objects field
Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.

Another future use case for a pool of objects is ease of memory management
in revision walking: If we can free the rev-list related memory early in
pack-objects (e.g. part of repack operation) then it could lower memory
pressure significantly when running on large repos. While this has been
discussed on the mailing list lately, this series doesn't implement this.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
da62f786d2 submodule: fixup nested submodules after moving the submodule
connect_work_tree_and_git_dir is used to connect a submodule worktree with
its git directory and vice versa after events that require a reconnection
such as moving around the working tree. As submodules can have nested
submodules themselves, we'd also want to fix the nested submodules when
asked to. Add an option to recurse into the nested submodules and connect
them as well.

As submodules are identified by their name (which determines their git
directory in relation to their superproject's git directory) internally
and by their path in the working tree of the superproject, we need to
make sure that the mapping of name <-> path is kept intact. We can do
that in the git-mv command by writing out the gitmodules file first
and then forcing a reload of the submodule config machinery.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29 09:44:51 -07:00
0c89fdd739 submodule-config: remove submodule_from_cache
This continues the story of bf12fcdf5e (submodule-config: store
the_submodule_cache in the_repository, 2017-06-22).

The previous patch taught submodule_from_path to take a repository into
account, such that submodule_from_{path, cache} are the same now.
Remove submodule_from_cache, migrating all its callers to
submodule_from_path.

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29 09:44:51 -07:00
90c62155d6 repository: introduce raw object store field
The raw object store field will contain any objects needed for access
to objects in a given repository.

This patch introduces the raw object store and populates it with the
`objectdir`, which used to be part of the repository struct.

As the struct gains members, we'll also populate the function to clear
the memory for these members.

In a later step, we'll introduce a struct object_parser, that will
complement the object parsing in a repository struct: The raw object
parser is the layer that will provide access to raw object content,
while the higher level object parser code will parse raw objects and
keeps track of parenthood and other object relationships using 'struct
object'.  For now only add the lower level to the repository struct.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-23 11:06:01 -07:00
2bee50a083 repository: delete ignore_env member
This variable was added because the repo_set_gitdir() was created to
cover both submodule and main repos, but these two are initialized a
bit differently so ignore_env == 0 means main repo, while ignore_env
!= 0 is submodules.

Since the difference part (env variables) has been moved out of
repo_set_gitdir(), this function works the same way for both repo
types and ignore_env is not needed anymore.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:04 -08:00
7bc0dcaa61 sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
getenv() is supposed to work on the main repository only. This delayed
getenv() code in sha1_file.c makes it more difficult to convert
sha1_file.c to a generic object store that could be used by both
submodule and main repositories.

Move the getenv() back in setup_git_env() where other env vars are
also fetched.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
0ac5af5995 repository.c: delete dead functions
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
357a03ebe9 repository.c: move env-related setup code back to environment.c
It does not make sense that generic repository code contains handling
of environment variables, which are specific for the main repository
only. Refactor repo_set_gitdir() function to take $GIT_DIR and
optionally _all_ other customizable paths. These optional paths can be
NULL and will be calculated according to the default directory layout.

Note that some dead functions are left behind to reduce diff
noise. They will be deleted in the next patch.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
b2f0eceecf repository: initialize the_repository in main()
This simplifies initialization of struct repository and anything
inside. Easier to read. Easier to add/remove fields.

Everything will go through main() common-main.c so this should cover all
programs, including t/helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
e75c862125 Merge branch 'tg/split-index-fixes'
The split-index mode had a few corner case bugs fixed.

* tg/split-index-fixes:
  travis: run tests with GIT_TEST_SPLIT_INDEX
  split-index: don't write cache tree with null oid entries
  read-cache: fix reading the shared index for other repos
2018-02-13 13:39:13 -08:00
e26f7f19b6 repository: pre-initialize hash algo pointer
There are various git subcommands (among them, clone) which don't set up
the repository (that is, they lack RUN_SETUP or RUN_SETUP_GENTLY) but
end up needing to have information about the hash algorithm in use.
Because the hash algorithm is part of struct repository and it's only
initialized in repository setup, we can end up dereferencing a NULL
pointer in some cases if we call one of these subcommands and look up
the empty blob or empty tree values.

A "git clone" of a project that has two paths that differ only in
case suffers from this if it is run on a case insensitive platform.
When the command attempts to check out one of these two paths after
checking out the other one, the checkout codepath needs to see if
the version that is already on the filesystem (which should not
happen if the FS were case sensitive) is dirty, and it needs to
exercise the hashing code at that point.

In the future, we can add a command line option for this or read it
from the configuration, but until we're ready to expose that
functionality to the user, simply initialize the repository
structure to use the current hash algorithm, SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 14:23:32 -08:00
a125a22334 read-cache: fix reading the shared index for other repos
read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:34 -08:00
c250e02e2c repository: fix a sparse 'using integer as NULL pointer' warning
Commit 78a6766802 ("Integrate hash algorithm support with repo setup",
2017-11-12) added a 'const struct git_hash_algo *hash_algo' field to the
repository structure, without modifying the initializer of the 'the_repo'
variable. This does not actually introduce a bug, since the '0' initializer
for the 'ignore_env:1' bit-field is interpreted as a NULL pointer (hence
the warning), and the final field (now with no initializer) receives a
default '0'.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-28 12:37:37 +09:00
78a6766802 Integrate hash algorithm support with repo setup
In future versions of Git, we plan to support an additional hash
algorithm.  Integrate the enumeration of hash algorithms with repository
setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in
read_repository_format.  In the future, we'll enumerate this value from
the configuration.

Add a constant, the_hash_algo, which points to the hash_algo structure
pointer in the repository global.  Note that this is the hash which is
used to serialize data to disk, not the hash which is used to display
items to the user.  The transition plan anticipates that these may be
different.  We can add an additional element in the future (say,
ui_hash_algo) to provide for this case.

Include repository.h in cache.h since we now need to have access to
these struct and variable definitions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13 13:20:44 +09:00
90dd04aaeb repository: use FREE_AND_NULL
Use the macro FREE_AND_NULL to release allocated objects and clear their
pointers.  This is shorter and documents the intent better by combining
the two related operations into one.

Patch generated with Coccinelle and contrib/coccinelle/free.cocci.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-02 13:15:59 +09:00
1fb2b636c6 set_git_dir: handle feeding gitdir to itself
Ideally we'd free the existing gitdir field before assigning
the new one, to avoid a memory leak. But we can't do so
safely because some callers do the equivalent of:

  set_git_dir(get_git_dir());

We can detect that case as a noop, but there are even more
complicated cases like:

  set_git_dir(remove_leading_path(worktree, get_git_dir());

where we really do need to do some work, but the original
string must remain valid.

Rather than put the burden on callers to make a copy of the
string (only to free it later, since we'll make a copy of it
ourselves), let's solve the problem inside set_git_dir(). We
can make a copy of the pointer for the old gitdir, and then
avoid freeing it until after we've made our new copy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 18:06:26 +09:00
f9b7573f6b repository: free fields before overwriting them
It's possible that the repository data may be initialized
twice (e.g., after doing a chdir() to the top of the
worktree we may have to adjust a relative git_dir path). We
should free() any existing fields before assigning to them
to avoid leaks.

This should be safe, as the fields are set based on the
environment or on other strings like the gitdir or
commondir. That makes it impossible that we are feeding an
alias to the just-freed string.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 18:06:26 +09:00