uchardet/uchardet - uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. Returned encoding names are iconv-compatible. (mirrored from https://gitlab.freedesktop.org/uchardet/uchardet)

Age	Commit message (Expand)	Author	Files	Lines
2021-03-22	gitlab-ci: make test on Windows too.wip/Jehan/improved-API-make-test-win64	Jehan	1	-1/+2
2021-03-22	test: fix test binary build for Windows.	Jehan	1	-2/+9
2021-03-22	src: reset shortcut charset/language on Reset().	Jehan	1	-0/+8
2021-03-22	src: do not test with nsLatin1Prober anymore.	Jehan	1	-2/+9
2021-03-22	src: improve confidence computation (generic and single-byte charset).	Jehan	3	-26/+31
2021-03-22	script: generate more complete frequent characters when range is set.	Jehan	1	-19/+16
2021-03-22	script, src: regenerate the Thai model.	Jehan	3	-288/+325
2021-03-21	src, script: fix the order of characters for Vietnamese.	Jehan	2	-376/+356
2021-03-21	src, script: add concept of alphabet_mapping in language models.	Jehan	4	-237/+192
2021-03-21	script: regenerate Slovak and Slovene with better alphabet support.	Jehan	6	-558/+587
2021-03-21	script: fix a stupid bug making same ratio for all frequent characters.	Jehan	1	-1/+1
2021-03-21	script, src: regenerate the Vietnamese model.	Jehan	3	-229/+383
2021-03-20	src: fix negative confidence wrapping around because of unsigned int.	Jehan	1	-1/+1
2021-03-20	script, src: remove generated statistics data for Korean.	Jehan	5	-1315/+2
2021-03-20	src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.	Jehan	4	-1/+313
2021-03-19	README: fix a duplicate.	Jehan	1	-1/+1
2021-03-19	Update README.	Jehan	1	-20/+105
2021-03-19	src: consider any combination with a non-frequent character as sequence.	Jehan	1	-0/+10
2021-03-19	src: add Hindi/UTF-8 support.	Jehan	8	-2/+501
2021-03-19	src: improve confidence computation.	Jehan	2	-26/+108
2021-03-19	script: fix a bit BuildLangModel.py when use_ascii is True.	Jehan	1	-3/+8
2021-03-19	script, src: add generic Korean model.	Jehan	8	-41/+2223
2021-03-18	src, test: fix the new Johab prober and add a test.	Jehan	4	-8/+15
2021-03-17	src: build new charset prober for Johab Korean.	Jehan	6	-6/+8
2021-03-17	add charset prober for Johab Korean	LSY	9	-2/+1029
2021-03-17	script, src: generate the Hebrew models.	Jehan	10	-172/+642
2021-03-17	test: 4 new tests for UTF-8.	Jehan	4	-0/+8
2021-03-17	src: drop the SURE_YES confidence for character distribution probers.	Jehan	1	-1/+1
2021-03-17	src: do not shortcut UTF-8 detection too early.	Jehan	1	-1/+3
2021-03-17	src: nsEscCharsetProber also returns the correct language.	Jehan	6	-6/+21
2021-03-17	src: make nsMBCSGroupProber report all valid candidates.	Jehan	4	-99/+202
2021-03-17	src: allow for nsCharSetProber to return several candidates.	Jehan	27	-96/+110
2021-03-17	src: nsMBCSGroupProber confidence weighed by language confidence.	Jehan	1	-2/+16
2021-03-17	src: tweak again the language detection confidence.	Jehan	1	-13/+9
2021-03-17	test: update unit test to check detected languages.	Jehan	1	-21/+44
2021-03-17	src: reset language detectors when resetting a nsMBCSGroupProber.	Jehan	1	-0/+6
2021-03-17	src, script: regenerate all existing language models.	Jehan	43	-4708/+5426
2021-03-16	Using the generic language detector in UTF-8 detection.	Jehan	29	-42/+234
2021-03-16	New generic language detector class.	Jehan	3	-0/+300
2021-03-16	Rebuild a bunch of language models.	Jehan	14	-1271/+1617
2021-03-14	src: add a --weight option to the CLI tool.	Jehan	1	-13/+72
2021-03-14	src: new weight concept in the C API.	Jehan	3	-4/+86
2021-03-14	src: fix the usage of `uchardet` tool.	Jehan	1	-1/+1
2021-03-14	src: `uchardet` tool now shows the language code in verbose mode.	Jehan	1	-3/+9
2021-03-14	script: update BuildLangModel.py to updated SequenceModel struct.	Jehan	1	-1/+2
2021-03-14	src: new API to get the detected language.	Jehan	51	-104/+276
2021-03-14	test: fix test script to use the new API and get rid of build warning.	Jehan	1	-1/+1
2021-03-14	src: new option --verbose\|-V in the `uchardet` CLI tool.	Jehan	1	-10/+38
2021-03-14	src: new API to get all candidates and their confidence.	Jehan	3	-3/+51
2021-03-14	src: now reporting encoding+confidence and keeping a list.	Jehan	3	-26/+62