Commit Graph

3 Commits

Author SHA1 Message Date
647b5e0998 tests: adjust whitespace in chainlint expectations
The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.

The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.

To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.

Instead of improving the detection logic, fix our ".expect" files so
that we do not need any post-processing at all anymore. This allows us
to drop the `-w` flag when diffing so that we can always use diff(1)
now.

Note that we keep some of the post-processing of `chainlint.pl` output
intact to strip leading line numbers generated by the script. Having
these would cause a rippling effect whenever we add a new test that
sorts into the middle of existing tests and would require us to
renumerate all subsequent lines, which seems rather pointless.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 08:36:14 -08:00
73c768dae9 chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.

Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:

    msg="total: $(cd data; wc -w *.txt) words"

requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:

    cd data ; ?!AMP?! wc -w *.txt

However, the parent parse context will see everything inside the
double-quotes as a single string token:

    "total: $(cd data ; ?!AMP?! wc -w *.txt) words"

losing whatever positional information was attached to the ";" token
where the problem was detected.

One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:

    "total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"

and then extract the positional information when annotating the original
test definition.

However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.

Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:

    * indentation (and whitespace, in general)

    * comments

    * here-doc bodies

    * here-doc tag quoting (i.e. "\EOF")

    * line-splices (i.e. "\" at the end of a line)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00
56066523ed t/chainlint: add more chainlint.pl self-tests
During the development of chainlint.pl, numerous new self-tests were
created to verify correct functioning beyond the checks already
represented by the existing self-tests. The new checks fall into several
categories:

* behavior of the lexical analyzer for complex cases, such as line
  splicing, token pasting, entering and exiting string contexts inside
  and outside of test script bodies; for instance:

    test_expect_success 'title' '
      x=$(echo "something" |
        sed -e '\''s/\\/\\\\/g'\'' -e '\''s/[[/.*^$]/\\&/g'\''
    '

* behavior of the parser for all compound grammatical constructs, such
  as `if...fi`, `case...esac`, `while...done`, `{...}`, etc., and for
  other legal shell grammatical constructs not covered by existing
  chainlint.sed self-tests, as well as complex cases, such as:

    OUT=$( ((large_git 1>&3) | :) 3>&1 ) &&

* detection of problems, such as &&-chain breakage, from top-level to
  any depth since the existing self-tests do not cover any top-level
  context and only cover subshells one level deep due to limitations of
  chainlint.sed

* address blind spots in chainlint.sed (such as not detecting a broken
  &&-chain on a one-line for-loop in a subshell[1]) which chainlint.pl
  correctly detects

* real-world cases which tripped up chainlint.pl during its development

[1]: https://lore.kernel.org/git/dce35a47012fecc6edc11c68e91dbb485c5bc36f.1661663880.git.gitgitgadget@gmail.com/

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00