convert: add round trip check based on 'core.checkRoundtripEncoding'

UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].

Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.

Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.

[1] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Lars Schneider
2018-04-15 20:16:10 +02:00
committed by Junio C Hamano
parent 541d059cd9
commit e92d622536
7 changed files with 137 additions and 0 deletions

View File

@ -203,4 +203,43 @@ test_expect_success 'error if encoding garbage is already in Git' '
test_i18ngrep "error: BOM is required" err.out
'
test_expect_success 'check roundtrip encoding' '
test_when_finished "rm -f roundtrip.shift roundtrip.utf16" &&
test_when_finished "git reset --hard HEAD" &&
text="hallo there!\nroundtrip test here!" &&
printf "$text" | iconv -f UTF-8 -t SHIFT-JIS >roundtrip.shift &&
printf "$text" | iconv -f UTF-8 -t UTF-16 >roundtrip.utf16 &&
echo "*.shift text working-tree-encoding=SHIFT-JIS" >>.gitattributes &&
# SHIFT-JIS encoded files are round-trip checked by default...
GIT_TRACE=1 git add .gitattributes roundtrip.shift 2>&1 |
grep "Checking roundtrip encoding for SHIFT-JIS" &&
git reset &&
# ... unless we overwrite the Git config!
! GIT_TRACE=1 git -c core.checkRoundtripEncoding=garbage \
add .gitattributes roundtrip.shift 2>&1 |
grep "Checking roundtrip encoding for SHIFT-JIS" &&
git reset &&
# UTF-16 encoded files should not be round-trip checked by default...
! GIT_TRACE=1 git add roundtrip.utf16 2>&1 |
grep "Checking roundtrip encoding for UTF-16" &&
git reset &&
# ... unless we tell Git to check it!
GIT_TRACE=1 git -c core.checkRoundtripEncoding="UTF-16, UTF-32" \
add roundtrip.utf16 2>&1 |
grep "Checking roundtrip encoding for utf-16" &&
git reset &&
# ... unless we tell Git to check it!
# (here we also check that the casing of the encoding is irrelevant)
GIT_TRACE=1 git -c core.checkRoundtripEncoding="UTF-32, utf-16" \
add roundtrip.utf16 2>&1 |
grep "Checking roundtrip encoding for utf-16" &&
git reset
'
test_done