script: add an error handling for when iconv fail to convert from a codepoint.

It could happen either when our character set table is wrong, but it could also happen for when iconv has a bug with incomplete charset tables. For instance, I was trying to implement IBM880 for #29, but iconv was missing a few codepoints. For instance, it seems to think that 0x45 (є), 0.55 (ў), 0x74 (Ў) are meant to be illegal in IBM880 (and possibly others), but the information we have seem to say they are valid. And Python does not support this character set at all. This test will help discovering the issue earlier (rather than breaking a few line later because `iconv` failed and returned an empty string, making ord() fail with TypeError exception. See: https://gitlab.freedesktop.org/uchardet/uchardet/-/issues/29#note_1691847
author: Jehan <jehan@girinstud.io> 2022-12-17 18:00:22 +0100
committer: Jehan <jehan@girinstud.io> 2022-12-17 18:00:22 +0100
commit: 5e25e93da795c22265befcdc72d1ffd0daed6934 (patch)
tree: 5b8207030092489c6b69dadc0cd70c0e13ae4a0d
parent: 6d31689632b48947f65536444682a487cab722f6 (diff)
1 files changed, 3 insertions, 0 deletions
diff --git a/script/BuildLangModel.py b/script/BuildLangModel.py
index 684ece6..1c94a97 100755
--- a/script/BuildLangModel.py
+++ b/script/BuildLangModel.py
@@ -537,6 +537,9 @@ for charset in charsets:
                     except FileNotFoundError:
                         print('Error: "{}" is not a supported charset by python and `iconv` is not installed.\n')
                         exit(1)
+                    if len(uchar) == 0:
+                        print('TypeError: iconv failed to return a unicode character for codepoint "{}" in charset {}.\n'.format(hex(cp), charset))
+                        exit(1)
                 #if lang.case_mapping and uchar.isupper() and \
                    #len(unicodedata.normalize('NFC', uchar.lower())) == 1:
                    # Unless we encounter special cases of characters with no
author	Jehan <jehan@girinstud.io>	2022-12-17 18:00:22 +0100
committer	Jehan <jehan@girinstud.io>	2022-12-17 18:00:22 +0100
commit	5e25e93da795c22265befcdc72d1ffd0daed6934 (patch)
tree	5b8207030092489c6b69dadc0cd70c0e13ae4a0d
parent	6d31689632b48947f65536444682a487cab722f6 (diff)