update_unicode.sh: move it into contrib/update-unicode
As it's used only by a tiny minority of the Git developer population, this script does not belong into the main Git source directory. Move it into contrib/ and adjust the paths to account for the new location. Signed-off-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
committed by
Junio C Hamano
parent
32c239d1fb
commit
f3eb54920e
3
contrib/update-unicode/.gitignore
vendored
Normal file
3
contrib/update-unicode/.gitignore
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
uniset/
|
||||
UnicodeData.txt
|
||||
EastAsianWidth.txt
|
||||
20
contrib/update-unicode/README
Normal file
20
contrib/update-unicode/README
Normal file
@ -0,0 +1,20 @@
|
||||
TL;DR: Run update_unicode.sh after the publication of a new Unicode
|
||||
standard and commit the resulting unicode_widths.h file.
|
||||
|
||||
The long version
|
||||
================
|
||||
|
||||
The Git source code ships the file unicode_widths.h which contains
|
||||
tables of zero and double width Unicode code points, respectively.
|
||||
These tables are generated using update_unicode.sh in this directory.
|
||||
update_unicode.sh itself uses a third-party tool, uniset, to query two
|
||||
Unicode data files for the interesting code points.
|
||||
|
||||
On first run, update_unicode.sh clones uniset from Github and builds it.
|
||||
This requires a current-ish version of autoconf (2.69 works per December
|
||||
2016).
|
||||
|
||||
On each run, update_unicode.sh checks whether more recent Unicode data
|
||||
files are available from the Unicode consortium, and rebuilds the header
|
||||
unicode_widths.h with the new data. The new header can then be
|
||||
committed.
|
||||
38
contrib/update-unicode/update_unicode.sh
Executable file
38
contrib/update-unicode/update_unicode.sh
Executable file
@ -0,0 +1,38 @@
|
||||
#!/bin/sh
|
||||
#See http://www.unicode.org/reports/tr44/
|
||||
#
|
||||
#Me Enclosing_Mark an enclosing combining mark
|
||||
#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
|
||||
#Cf Format a format control character
|
||||
#
|
||||
cd "$(dirname "$0")"
|
||||
UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
|
||||
(
|
||||
if ! test -f UnicodeData.txt; then
|
||||
wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
|
||||
fi &&
|
||||
if ! test -f EastAsianWidth.txt; then
|
||||
wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
|
||||
fi &&
|
||||
if ! test -d uniset; then
|
||||
git clone https://github.com/depp/uniset.git
|
||||
fi &&
|
||||
(
|
||||
cd uniset &&
|
||||
if ! test -x uniset; then
|
||||
autoreconf -i &&
|
||||
./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
|
||||
fi &&
|
||||
make
|
||||
) &&
|
||||
UNICODE_DIR=. && export UNICODE_DIR &&
|
||||
cat >$UNICODEWIDTH_H <<-EOF
|
||||
static const struct interval zero_width[] = {
|
||||
$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
|
||||
grep -v plane)
|
||||
};
|
||||
static const struct interval double_width[] = {
|
||||
$(uniset/uniset --32 eaw:F,W)
|
||||
};
|
||||
EOF
|
||||
)
|
||||
Reference in New Issue
Block a user