Commit Graph

192 Commits

Author SHA1 Message Date
41771fa435 cache.h: remove dependence on hex.h; make other files include it explicitly
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
61bdc7c5d8 diff: mark unused parameters in callbacks
The diff code provides a format_callback interface, but not every
callback needs each parameter (e.g., the "opt" and "data" parameters are
frequently left unused). Likewise for the output_prefix callback, the
low-level change/add_remove interfaces, the callbacks used by
xdi_diff(), etc.

Mark unused arguments in the callback implementations to quiet
-Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13 22:16:23 +09:00
dd407f1c7c Merge branch 'ab/unused-annotation'
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.

* ab/unused-annotation:
  git-compat-util.h: use "deprecated" for UNUSED variables
  git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14 12:56:39 -07:00
a6b42ec0c6 Merge branch 'jk/unused-annotation'
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.

* jk/unused-annotation:
  is_path_owned_by_current_uid(): mark "report" parameter as unused
  run-command: mark unused async callback parameters
  mark unused read_tree_recursive() callback parameters
  hashmap: mark unused callback parameters
  config: mark unused callback parameters
  streaming: mark unused virtual method parameters
  transport: mark bundle transport_options as unused
  refs: mark unused virtual method parameters
  refs: mark unused reflog callback parameters
  refs: mark unused each_ref_fn parameters
  git-compat-util: add UNUSED macro
2022-09-14 12:56:39 -07:00
5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
02c3c59e62 hashmap: mark unused callback parameters
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
99d86d60e5 parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options
The description of 'PARSE_OPT_KEEP_UNKNOWN' starts with "Keep unknown
arguments instead of erroring out".  This is a bit misleading, as this
flag only applies to unknown --options, while non-option arguments are
kept even without this flag.

Update the description to clarify this, and rename the flag to
PARSE_OPTIONS_KEEP_UNKNOWN_OPT to make this obvious just by looking at
the flag name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 11:13:14 -07:00
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
2cc712324d Merge branch 'rs/fast-export-pathspec-fix'
"git fast-export -- <pathspec>" lost the pathspec when showing the
second and subsequent commits, which has been corrected.

* rs/fast-export-pathspec-fix:
  2.36 fast-export regression fix
2022-05-04 09:51:28 -07:00
d1c25272f5 2.36 fast-export regression fix
e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git fast-export doesn't set no_free, so path limiting stopped working
after the first commit.  Set the flag and add a basic test to make sure
only changes to the specified files are exported.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-30 11:50:33 -07:00
2108fe4a19 revisions API users: add straightforward release_revisions()
Add a release_revisions() to various users of "struct rev_list" in
those straightforward cases where we only need to add the
release_revisions() call to the end of a block, and don't need to
e.g. refactor anything to use a "goto cleanup" pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
430883a70c Merge branch 'ab/object-file-api-updates'
Object-file API shuffling.

* ab/object-file-api-updates:
  object-file API: pass an enum to read_object_with_reference()
  object-file.c: add a literal version of write_object_file_prepare()
  object-file API: have hash_object_file() take "enum object_type"
  object API: rename hash_object_file_literally() to write_*()
  object-file API: split up and simplify check_object_signature()
  object API users + docs: check <0, not !0 with check_object_signature()
  object API docs: move check_object_signature() docs to cache.h
  object API: correct "buf" v.s. "map" mismatch in *.c and *.h
  object-file API: have write_object_file() take "enum object_type"
  object-file API: add a format_object_header() function
  object-file API: return "void", not "int" from hash_object_file()
  object-file.c: split up declaration of unrelated variables
2022-03-16 17:53:08 -07:00
44439c1c58 object-file API: have hash_object_file() take "enum object_type"
Change the hash_object_file() function to take an "enum
object_type".

Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:32 -08:00
0f156dbb04 object-file API: split up and simplify check_object_signature()
Split up the check_object_signature() function into that non-streaming
version (it accepts an already filled "buf"), and a new
stream_object_signature() which will retrieve the object from storage,
and hash it on-the-fly.

All of the callers of check_object_signature() were effectively
calling two different functions, if we go by cyclomatic
complexity. I.e. they'd either take the early "if (map)" branch and
return early, or not. This has been the case since the "if (map)"
condition was added in 090ea12671 (parse_object: avoid putting whole
blob in core, 2012-03-07).

We can then further simplify the resulting check_object_signature()
function since only one caller wanted to pass a non-NULL "buf" and a
non-NULL "real_oidp". That "read_loose_object()" codepath used by "git
fsck" can instead use hash_object_file() followed by oideq().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
9164d97a63 i18n: fix some misformated placeholders in command synopsis
* add '<>' around arguments where missing
 * convert plurals into '...' forms

This applies the style guide for documentation.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-04 13:58:28 -08:00
c17de5a505 Merge branch 'ja/i18n-similar-messages'
Similar message templates have been consolidated so that
translators need to work on fewer number of messages.

* ja/i18n-similar-messages:
  i18n: turn even more messages into "cannot be used together" ones
  i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"
  i18n: factorize "--foo outside a repository"
  i18n: refactor "unrecognized %(foo) argument" strings
  i18n: factorize "no directory given for --foo"
  i18n: factorize "--foo requires --bar" and the like
  i18n: tag.c factorize i18n strings
  i18n: standardize "cannot open" and "cannot read"
  i18n: turn "options are incompatible" into "cannot be used together"
  i18n: refactor "%s, %s and %s are mutually exclusive"
  i18n: refactor "foo and bar are mutually exclusive"
2022-01-10 11:52:56 -08:00
6fa00ee843 i18n: factorize "--foo requires --bar" and the like
They are all replaced by "the option '%s' requires '%s'", which is a
new string but replaces 17 previous unique strings.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:31:00 -08:00
12909b6b8a i18n: turn "options are incompatible" into "cannot be used together"
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:29:23 -08:00
726a228dfb fast-export: fix surprising behavior with --first-parent
The revision traversal machinery typically processes and returns all
children before any parent.  fast-export needs to operate in the
reverse fashion, handling parents before any of their children in
order to build up the history starting from the root commit(s).  This
would be a clear case where we could just use the revision traversal
machinery's "reverse" option to achieve this desired affect.

However, this wasn't what the code did.  It added its own array for
queuing.  The obvious hand-rolled solution would be to just push all
the commits into the array and then traverse afterwards, but it didn't
quite do that either.  It instead attempted to process anything it
could as soon as it could, and once it could, check whether it could
process anything that had been queued.  As far as I can tell, this was
an effort to save a little memory in the case of multiple root commits
since it could process some commits before queueing all of them.  This
involved some helper functions named has_unshown_parent() and
handle_tail().  For typical invocations of fast-export, this
alternative essentially amounted to a hand-rolled method of reversing
the commits -- it was a bunch of work to duplicate the revision
traversal machinery's "reverse" option.

This hand-rolled reversing mechanism is actually somewhat difficult to
reason about.  It takes some time to figure out how it ensures in
normal cases that it will actually process all traversed commits
(rather than just dropping some and not printing anything for them).

And it turns out there are some cases where the code does drop commits
without handling them, and not even printing an error or warning for
the user.  Due to the has_unshown_parent() checks, some commits could
be left in the array at the end of the "while...get_revision()" loop
which would be unprocessed.  This could be triggered for example with
    git fast-export main -- --first-parent
or non-sensical traversal rules such as
    git fast-export main -- --grep=Merge --invert-grep

While most traversals that don't include all parents should likely
trigger errors in fast-export (or at least require being used in
combination with --reference-excluded-parents), the --first-parent
traversal is at least reasonable and it'd be nice if it didn't just drop
commits. It'd also be nice for future readers of the code to have a
simpler "reverse traversal" mechanism. Use the "reverse" option of the
revision traversal machinery to achieve both.

Even for the non-sensical traversal flags like the --grep one above,
this would be an improvement. For example, in that case, the code
previously would have silently truncated history to only those commits
that do not have an ancestor containing "Merge" in their commit message.
After this code change, that case would include all commits without
"Merge" in their commit message -- but any commit that previously had a
"Merge"-mentioning parent would lose that parent
(likely resulting in many new root commits). While the new behavior is
still odd, it is at least understandable given that
--reference-excluded-parents is not the default.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: William Sprent <williams@unity3d.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-21 12:51:09 -08:00
061a21d36d Merge branch 'ab/fsck-unexpected-type'
"git fsck" has been taught to report mismatch between expected and
actual types of an object better.

* ab/fsck-unexpected-type:
  fsck: report invalid object type-path combinations
  fsck: don't hard die on invalid object types
  object-file.c: stop dying in parse_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: don't set "typep" when returning non-zero
  cat-file tests: test for current --allow-unknown-type behavior
  cat-file tests: add corrupt loose object test
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: move bogus_* variable declarations earlier
  fsck tests: test for garbage appended to a loose object
  fsck tests: test current hash/type mismatch behavior
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
2021-10-25 16:06:56 -07:00
96e41f58fe fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:01 -07:00
febba8038d Merge branch 'tk/fast-export-anonymized-tag-fix'
The output from "git fast-export", when its anonymization feature
is in use, showed an annotated tag incorrectly.

* tk/fast-export-anonymized-tag-fix:
  fast-export: fix anonymized tag using original length
2021-09-10 11:46:27 -07:00
2f040a9671 fast-export: fix anonymized tag using original length
Commit 7f40759496 (fast-export: tighten anonymize_mem() interface to
handle only strings, 2020-06-23) changed the interface used in anonymizing
strings, but failed to update the size of annotated tag messages to match
the new anonymized string.

As a result, exporting tags having messages longer than 13 characters
would create output that couldn't be parsed by fast-import,
as the data length indicated was larger than the data output.

Reset the message size when anonymizing, and add a tag with a "long"
message to the test.

Signed-off-by: Tal Kelrich <hasturkun@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 12:11:57 -07:00
14228447c9 hash: provide per-algorithm null OIDs
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:39 +09:00
e73fe3dd02 builtin/*: update usage format
According to the guidelines in parse-options.h,
we should not end in a full stop or start with
a capital letter. Fix old error and usage
messages to match this expectation.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06 15:10:49 -08:00
56f56ac50b style: do not "break" in switch() after "return"
Remove this unreachable code. It was found by SunCC, it's found by a
non-fatal warning emitted by SunCC. It's one of the things it's more
vehement about than GCC & Clang.

It complains about a lot of other similarly unreachable code, e.g. a
BUG(...) without a "return", and a "return 0" after a long if/else,
both of whom have "return" statements. Those are also genuine
redundancies to a compiler, but arguably make the code a bit easier to
read & less fragile to maintain.

These return/break cases are just unnecessary however, and as seen
here the surrounding code just did a plain "return" without a "break"
already.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-15 16:32:50 -08:00
58138d3f26 Merge branch 'js/default-branch-name-part-2'
Update the tests to drop word 'master' from them.

* js/default-branch-name-part-2:
  t9902: avoid using the branch name `master`
  tests: avoid variations of the `master` branch name
  t3200: avoid variations of the `master` branch name
  fast-export: avoid using unnecessary language in a code comment
  t/test-terminal: avoid non-inclusive language
2020-10-05 14:01:50 -07:00
86cca370e1 Merge branch 'jk/drop-unaligned-loads'
Compilation fix around type punning.

* jk/drop-unaligned-loads:
  Revert "fast-export: use local array to store anonymized oid"
  bswap.h: drop unaligned loads
2020-10-04 12:49:06 -07:00
176380fd11 Revert "fast-export: use local array to store anonymized oid"
This reverts commit f39ad38410.

That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.

However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:30:11 -07:00
5a0c32bd4b fast-export: avoid using unnecessary language in a code comment
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:27 -07:00
f24c30e0b6 wt-status: tolerate dangling marks
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:

  git clone $URL client
  cd client
  git checkout @{u}
  git status

no status is printed, but instead an error message:

  fatal: HEAD does not point to a branch

(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)

This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.

Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02 14:39:25 -07:00
f39ad38410 fast-export: use local array to store anonymized oid
Some older versions of gcc complain about this line:

  builtin/fast-export.c:412:2: error: dereferencing type-punned pointer
       will break strict-aliasing rules [-Werror=strict-aliasing]
    put_be32(oid.hash + hashsz - 4, counter++);
    ^

This seems to be a false positive, as there's no type-punning at all
here. oid.hash is an array of unsigned char; when we pass it to a
function it decays to a pointer to unsigned char. We do take a void
pointer in put_be32(), but it's immediately aliased with another pointer
to unsigned char (and clearly the compiler is looking inside the inlined
put_be32(), since the warning doesn't happen with -O0).

This happens on gcc 4.8 and 4.9, but not later versions (I tested gcc 6,
7, 8, and 9).

We can work around it by using a local array instead of an object_id
struct. This is a little more intimate with the details of object_id,
but for whatever reason doesn't seem to trigger the compiler warning.
We can revert this patch once we decide that those gcc versions are too
old to care about for a warning like this (gcc 4.8 is the default
compiler for Ubuntu Trusty, which is out-of-support but not fully
end-of-life'd until April 2022).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
8a49495583 fast-export: anonymize "master" refname
Running "fast-export --anonymize" will leave "refs/heads/master"
untouched in the output, for two reasons:

  - it helped to have some known reference point between the original
    and anonymized repository

  - since it's historically the default branch name, it doesn't leak any
    information

Now that we can ask fast-export to retain particular tokens, we have a
much better tool for the first one (because it works for any ref, not
just master).

For the second, the notion of "default branch name" is likely to become
configurable soon, at which point the name _does_ leak information.
Let's drop this special case in preparation.

Note that we have to adjust the test a bit, since it relied on using the
name "master" in the anonymized repos. We could just use
--anonymize-map=master to keep the same output, but then we wouldn't
know if it works because of our hard-coded master or because of the
explicit map.

So let's flip the test a bit, and confirm that we anonymize "master",
but keep "other" in the output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
65b5d9fae7 fast-export: allow seeding the anonymized mapping
After you anonymize a repository, it can be hard to find which commits
correspond between the original and the result, and thus hard to
reproduce commands that triggered bugs in the original.

Let's make it possible to seed the anonymization map. This lets users
either:

  - mark names to be retained as-is, if they don't consider them secret
    (in which case their original commands would just work)

  - map names to new values, which lets them adapt the reproduction
    recipe to the new names without revealing the originals

The implementation is fairly straight-forward. We already store each
anonymized token in a hashmap (so that the same token appearing twice is
converted to the same result). We can just introduce a new "seed"
hashmap which is consulted first.

This does make a few more promises to the user about how we'll anonymize
things (e.g., token-splitting pathnames). But it's unlikely that we'd
want to change those rules, even if the actual anonymization of a single
token changes. And it makes things much easier for the user, who can
unblind only a directory name without having to specify each path within
it.

One alternative to this approach would be to anonymize as we see fit,
and then dump the whole refname and pathname mappings to a file. This
does work, but it's a bit awkward to use (you have to manually dig the
items you care about out of the mapping).

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
d5bf91fde4 fast-export: add a "data" callback parameter to anonymize_str()
The anonymize_str() function takes a generator callback, but there's no
way to pass extra context to it. Let's add the usual "void *data"
parameter to the generator interface and pass it along.

This is mildly annoying for existing callers, all of which pass NULL,
but is necessary to avoid extra globals in some cases we'll add in a
subsequent patch.

While we're touching each of these callbacks, we can further observe
that none of them use the existing orig/len parameters at all. This
makes sense, since the point is for their output to have no discernable
basis in the original (my original version had some notion that we might
use a one-way function to obfuscate the names, but it was never
implemented). So let's drop those extra parameters. If a caller really
wants to do something with them, it can pass a struct through the new
data parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
6416a865da fast-export: move global "idents" anonymize hashmap into function
All of the other anonymization functions keep their static mappings
inside the function to avoid polluting the global namespace. Let's do
the same for "idents", as nobody needs it outside of
anonymize_ident_line().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
55b01456a9 fast-export: use a flex array to store anonymized entries
Now that we're using a separate keydata struct for hash lookups, we have
more flexibility in how we allocate anonymized_entry structs. Let's push
the "orig" key into a flex member within the struct. That should save us
a few bytes of memory per entry (a pointer plus any malloc overhead),
and may make lookups a little faster (since it's one less pointer to
chase in the comparison function).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
a0f65641df fast-export: stop storing lengths in anonymized hashmaps
Now that the anonymize_str() interface is restricted to NUL-terminated
strings, there's no need for us to keep track of the length of each
entry in the hashmap. This simplifies the code and saves a bit of
memory.

Note that we do still need to compare the stored results to partial
strings passed in by the callers. We can do that by using hashmap's
keydata feature to get the ptr/len pair into the comparison function,
and then using strncmp().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
7f40759496 fast-export: tighten anonymize_mem() interface to handle only strings
While the anonymize_mem() interface _can_ store arbitrary byte
sequences, none of the callers uses this feature (as of the previous
commit). We'd like to keep it that way, as we'll be exposing the
string-like nature of the anonymization routines to the user. So let's
tighten up the interface a bit:

  - don't treat "len" as an out-parameter from anonymize_mem(); this
    ensures callers treat the pointer result as a NUL-terminated string

  - likewise, don't treat "len" as an out-parameter from generator
    functions

  - swap out "void *" for "char *" as appropriate to signal that we
    don't handle arbitrary memory

  - rename the function to anonymize_str()

This will also open up some optimization opportunities in a future
patch.

Note that we can't drop the "len" parameter entirely. Some callers do
pass in partial strings (e.g., "foo/bar", len=3) to avoid copying, and
we need to handle those still.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
750bb32589 fast-export: store anonymized oids as hex strings
When fast-export stores anonymized oids, it does so as binary strings.
And while the anonymous mapping storage is binary-clean (at least as of
the previous commit), this will become awkward when we start exposing
more of it to the user. In particular, if we allow a method for
retaining token "foo", then users may want to specify a hex oid as such
a token.

Let's just switch to storing the hex strings. The difference in memory
usage is negligible (especially considering how infrequently we'd
generally store an oid compared to, say, path components).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
b897bf5f37 fast-export: use xmemdupz() for anonymizing oids
Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:

  - for a partial string, we'd store all bytes up to the NUL, even
    though we'd never look at anything past "len". This didn't produce
    wrong behavior, but was wasteful.

  - for a binary oid that doesn't contain a zero byte, we'd copy garbage
    bytes off the end of the array (though as long as nothing complained
    about reading uninitialized bytes, further reads would be limited by
    "len", and we'd produce the correct results)

  - for a binary oid that does contain a zero byte, we'd copy _fewer_
    bytes than intended into the hashmap struct. When we later try to
    look up a value, we'd access uninitialized memory and potentially
    falsely claim that a particular oid is not present.

The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.

And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
78e67cda42 Merge branch 'mt/use-passed-repo-more-in-funcs'
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).

* mt/use-passed-repo-more-in-funcs:
  sha1-file: allow check_object_signature() to handle any repo
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  streaming: allow open_istream() to handle any repo
  pack-check: use given repo's hash_algo at verify_packfile()
  cache-tree: use given repo's hash_algo at verify_one()
  diff: make diff_populate_filespec() honor its repo argument
2020-02-14 12:54:22 -08:00
145136a95a C: use skip_prefix() to avoid hardcoded string length
We often skip an optional prefix in a string with a hardcoded
constant, e.g.

	if (starts_with(string, "prefix"))
		string += 6;

which is less error prone when written

	skip_prefix(string, "prefix", &string);

Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying

    '--expire=nonsense.timestamp' is not a valid timestamp

but with this change, we say

    'nonsense.timestamp' is not a valid timestamp

which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:03:45 -08:00
b98d188581 sha1-file: allow check_object_signature() to handle any repo
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
5efabc7ed9 Merge branch 'ew/hashmap'
Code clean-up of the hashmap API, both users and implementation.

* ew/hashmap:
  hashmap_entry: remove first member requirement from docs
  hashmap: remove type arg from hashmap_{get,put,remove}_entry
  OFFSETOF_VAR macro to simplify hashmap iterators
  hashmap: introduce hashmap_free_entries
  hashmap: hashmap_{put,remove} return hashmap_entry *
  hashmap: use *_entry APIs for iteration
  hashmap_cmp_fn takes hashmap_entry params
  hashmap_get{,_from_hash} return "struct hashmap_entry *"
  hashmap: use *_entry APIs to wrap container_of
  hashmap_get_next returns "struct hashmap_entry *"
  introduce container_of macro
  hashmap_put takes "struct hashmap_entry *"
  hashmap_remove takes "const struct hashmap_entry *"
  hashmap_get takes "const struct hashmap_entry *"
  hashmap_add takes "struct hashmap_entry *"
  hashmap_get_next takes "const struct hashmap_entry *"
  hashmap_entry_init takes "struct hashmap_entry *"
  packfile: use hashmap_entry in delta_base_cache_entry
  coccicheck: detect hashmap_entry.hash assignment
  diff: use hashmap_entry_init on moved_entry.ent
2019-10-15 13:48:02 +09:00
404ab78e39 hashmap: remove type arg from hashmap_{get,put,remove}_entry
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.

Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
f23a465132 hashmap_get{,_from_hash} return "struct hashmap_entry *"
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.

This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
26b455f21e hashmap_put takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
b6c5241606 hashmap_get takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00