uchardet/uchardet - uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. Returned encoding names are iconv-compatible. (mirrored from https://gitlab.freedesktop.org/uchardet/uchardet)

Age	Commit message (Expand)	Author	Files	Lines
2023-07-17	src: handle long sequences of characters.	Jehan	1	-10/+21
2023-07-17	Issue #33: crafted sequence of bytes triggers memory write past the bounds of…	Jehan	1	-2/+13
2023-07-17	src: fix mismatched new [] / delete.	Jehan	1	-2/+2
2023-07-17	Issue #32: Global buffer read overflow in `GetOrderFromCodePoint`.	Jehan	1	-13/+8
2022-12-20	script, src, test: new Georgian support.	Jehan	6	-2/+299
2022-12-20	script, src, test: adding Catalan support.	Jehan	6	-2/+218
2022-12-19	src: new Big5 detection implementation.	Jehan	6	-1060/+118
2022-12-18	Issue #21: Greek CP737 support.	Jehan	3	-152/+196
2022-12-18	script, src: generate more code for language and sequence model listing.	Jehan	43	-202/+375
2022-12-17	script, src, test: add Serbian support.	Jehan	8	-2/+285
2022-12-17	src, script: add Macedonian support.	Jehan	8	-2/+331
2022-12-17	script, src: regenerate Russian models and add UTF-8/Russian support.	Jehan	6	-276/+320
2022-12-17	script, src, test: add Ukrainian support.	Jehan	8	-3/+267
2022-12-17	script, src, test: adding Belarusian support.	Jehan	8	-2/+213
2022-12-17	script, src, test: Bulgarian language models added.	Jehan	6	-193/+225
2022-12-16	Issue #22: Hebrew CP862 support.	Jehan	4	-279/+326
2022-12-15	src: all language models now rebuilt after the fix.	Jehan	30	-3329/+3272
2022-12-14	scripts: all language models rebuilt with the new ratio data.	Jehan	30	-3298/+3461
2022-12-14	src: improve algorithm for confidence computation.	Jehan	2	-5/+31
2022-12-14	src: when checking for candidates, make sure we haven't any unprocessed…	Jehan	1	-1/+8
2022-12-14	script, src: rebuild the English model.	Jehan	1	-167/+67
2022-12-14	src: add a --language\|-l option to the uchardet CLI tool.	Jehan	1	-9/+30
2022-12-14	src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.	Jehan	4	-14/+14
2022-12-14	src: process pending language data when we are going to pass buffer size.	Jehan	1	-0/+11
2022-12-14	script, src: rebuild the Danish model.	Jehan	3	-84/+118
2022-12-14	script, src: update Norwegian model with the new language features.	Jehan	4	-180/+117
2022-12-14	script, src: add English language model.	Jehan	8	-2/+300
2022-12-14	src: drop less of UTF-8 confidence even with few non-multibyte chars.	Jehan	1	-2/+3
2022-12-14	src: reset shortcut charset/language on Reset().	Jehan	1	-0/+8
2022-12-14	src: do not test with nsLatin1Prober anymore.	Jehan	1	-2/+9
2022-12-14	src: improve confidence computation (generic and single-byte charset).	Jehan	3	-26/+31
2022-12-14	script, src: regenerate the Thai model.	Jehan	1	-169/+194
2022-12-14	src, script: fix the order of characters for Vietnamese.	Jehan	1	-266/+252
2022-12-14	src, script: add concept of alphabet_mapping in language models.	Jehan	1	-101/+105
2022-12-14	script: regenerate Slovak and Slovene with better alphabet support.	Jehan	2	-283/+287
2022-12-14	script, src: regenerate the Vietnamese model.	Jehan	1	-159/+266
2022-12-14	src: fix negative confidence wrapping around because of unsigned int.	Jehan	1	-1/+1
2022-12-14	script, src: remove generated statistics data for Korean.	Jehan	4	-1315/+0
2022-12-14	src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.	Jehan	4	-1/+313
2022-12-14	src: consider any combination with a non-frequent character as sequence.	Jehan	1	-0/+10
2022-12-14	src: add Hindi/UTF-8 support.	Jehan	5	-2/+233
2022-12-14	src: improve confidence computation.	Jehan	2	-26/+108
2022-12-14	script, src: add generic Korean model.	Jehan	5	-1/+1316
2022-12-14	src, test: fix the new Johab prober and add a test.	Jehan	3	-8/+14
2022-12-14	src: build new charset prober for Johab Korean.	Jehan	6	-6/+8
2022-12-14	add charset prober for Johab Korean	LSY	9	-2/+1029
2022-12-14	script, src: generate the Hebrew models.	Jehan	6	-172/+245
2022-12-14	src: drop the SURE_YES confidence for character distribution probers.	Jehan	1	-1/+1
2022-12-14	src: do not shortcut UTF-8 detection too early.	Jehan	1	-1/+3
2022-12-14	src: nsEscCharsetProber also returns the correct language.	Jehan	6	-6/+21