diff options
author | Simon McVittie <simon.mcvittie@collabora.co.uk> | 2013-04-22 15:36:32 +0100 |
---|---|---|
committer | Simon McVittie <simon.mcvittie@collabora.co.uk> | 2013-04-22 15:36:32 +0100 |
commit | 6b2add5e70252c513f506f84cc386f47953df48d (patch) | |
tree | cb5390549936a81565de69ff5ce5039511a99db8 /test | |
parent | 540e5692e07d48fb41a4e977e0c9078fa19bd677 (diff) |
Accept non-characters when validating Unicode
Unicode Corrigendum #9 clarifies that the non-characters U+nFFFE
(for n in the range 0 to 0x10), U+nFFFF (for n in the same range),
and U+FDD0..U+FDEF are valid for interchange, and their presence
does not make a string ill-formed.
GLib 2.36 made the corresponding change in its definition of UTF-8
as used by g_utf8_validate() and similar functions.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=63072
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Diffstat (limited to 'test')
-rw-r--r-- | test/syntax.c | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/test/syntax.c b/test/syntax.c index 88db9638..e26b3643 100644 --- a/test/syntax.c +++ b/test/syntax.c @@ -178,12 +178,14 @@ const char * const invalid_single_signatures[] = { const char * const valid_strings[] = { "", - "\xc2\xa9", + "\xc2\xa9", /* UTF-8 (c) symbol */ + "\xef\xbf\xbe", /* U+FFFE is reserved but Corrigendum 9 says it's OK */ NULL }; const char * const invalid_strings[] = { - "\xa9", + "\xa9", /* Latin-1 (c) symbol */ + "\xed\xa0\x80", /* UTF-16 surrogates are not valid in UTF-8 */ NULL }; |