uchardet/uchardet - uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. Returned encoding names are iconv-compatible. (mirrored from https://gitlab.freedesktop.org/uchardet/uchardet)

Age	Commit message (Expand)	Author	Files	Lines
2022-12-14	test: fix test binary build for Windows.	Jehan	1	-2/+9
2022-12-14	src: reset shortcut charset/language on Reset().	Jehan	1	-0/+8
2022-12-14	src: do not test with nsLatin1Prober anymore.	Jehan	1	-2/+9
2022-12-14	src: improve confidence computation (generic and single-byte charset).	Jehan	3	-26/+31
2022-12-14	script: generate more complete frequent characters when range is set.	Jehan	1	-19/+16
2022-12-14	script, src: regenerate the Thai model.	Jehan	3	-288/+325
2022-12-14	src, script: fix the order of characters for Vietnamese.	Jehan	2	-376/+356
2022-12-14	src, script: add concept of alphabet_mapping in language models.	Jehan	4	-237/+192
2022-12-14	script: regenerate Slovak and Slovene with better alphabet support.	Jehan	6	-558/+587
2022-12-14	script: fix a stupid bug making same ratio for all frequent characters.	Jehan	1	-1/+1
2022-12-14	script, src: regenerate the Vietnamese model.	Jehan	3	-229/+383
2022-12-14	src: fix negative confidence wrapping around because of unsigned int.	Jehan	1	-1/+1
2022-12-14	script, src: remove generated statistics data for Korean.	Jehan	5	-1315/+2
2022-12-14	src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.	Jehan	4	-1/+313
2022-12-14	README: fix a duplicate.	Jehan	1	-1/+1
2022-12-14	Update README.	Jehan	1	-20/+105
2022-12-14	src: consider any combination with a non-frequent character as sequence.	Jehan	1	-0/+10
2022-12-14	src: add Hindi/UTF-8 support.	Jehan	8	-2/+501
2022-12-14	src: improve confidence computation.	Jehan	2	-26/+108
2022-12-14	script: fix a bit BuildLangModel.py when use_ascii is True.	Jehan	1	-3/+8
2022-12-14	script, src: add generic Korean model.	Jehan	8	-41/+2223
2022-12-14	src, test: fix the new Johab prober and add a test.	Jehan	4	-8/+15
2022-12-14	src: build new charset prober for Johab Korean.	Jehan	6	-6/+8
2022-12-14	add charset prober for Johab Korean	LSY	9	-2/+1029
2022-12-14	script, src: generate the Hebrew models.	Jehan	10	-172/+642
2022-12-14	test: 4 new tests for UTF-8.	Jehan	4	-0/+8
2022-12-14	src: drop the SURE_YES confidence for character distribution probers.	Jehan	1	-1/+1
2022-12-14	src: do not shortcut UTF-8 detection too early.	Jehan	1	-1/+3
2022-12-14	src: nsEscCharsetProber also returns the correct language.	Jehan	6	-6/+21
2022-12-14	src: make nsMBCSGroupProber report all valid candidates.	Jehan	4	-99/+202
2022-12-14	src: allow for nsCharSetProber to return several candidates.	Jehan	27	-96/+110
2022-12-14	src: nsMBCSGroupProber confidence weighed by language confidence.	Jehan	1	-2/+16
2022-12-14	src: tweak again the language detection confidence.	Jehan	1	-13/+9
2022-12-14	test: update unit test to check detected languages.	Jehan	1	-23/+43
2022-12-14	src: reset language detectors when resetting a nsMBCSGroupProber.	Jehan	1	-0/+6
2022-12-14	src, script: regenerate all existing language models.	Jehan	43	-4708/+5426
2022-12-14	Using the generic language detector in UTF-8 detection.	Jehan	29	-42/+234
2022-12-14	New generic language detector class.	Jehan	3	-0/+300
2022-12-14	Rebuild a bunch of language models.	Jehan	14	-1401/+1617
2022-12-14	src: add a --weight option to the CLI tool.	Jehan	1	-13/+72
2022-12-14	src: new weight concept in the C API.	Jehan	3	-4/+86
2022-12-14	src: fix the usage of `uchardet` tool.	Jehan	1	-1/+1
2022-12-14	src: `uchardet` tool now shows the language code in verbose mode.	Jehan	1	-3/+9
2022-12-14	script: update BuildLangModel.py to updated SequenceModel struct.	Jehan	1	-1/+2
2022-12-14	src: new API to get the detected language.	Jehan	51	-104/+276
2022-12-14	test: fix test script to use the new API and get rid of build warning.	Jehan	1	-1/+1
2022-12-14	src: new option --verbose\|-V in the `uchardet` CLI tool.	Jehan	1	-10/+38
2022-12-14	src: new API to get all candidates and their confidence.	Jehan	3	-3/+51
2022-12-14	src: now reporting encoding+confidence and keeping a list.	Jehan	3	-26/+62
2022-12-08	README, doc: some README and release procedure updates.	Jehan	2	-9/+13