uchardet/uchardet - uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. Returned encoding names are iconv-compatible. (mirrored from https://gitlab.freedesktop.org/uchardet/uchardet)

Age	Commit message (Expand)	Author	Files	Lines
2023-11-15	gitlab-ci: CI is now forbidden on MR run by passing-by contributors.HEAD master	Jehan	1	-0/+7
2023-11-15	Add notepad++ to readme	Jaroslav Lobačevski	1	-0/+1
2023-07-17	src: handle long sequences of characters.	Jehan	1	-10/+21
2023-07-17	Issue #33: crafted sequence of bytes triggers memory write past the bounds of…	Jehan	1	-2/+13
2023-07-17	src: fix mismatched new [] / delete.	Jehan	1	-2/+2
2023-07-17	Issue #32: Global buffer read overflow in `GetOrderFromCodePoint`.	Jehan	1	-13/+8
2023-07-17	CMake: enable ASAN in Debug builds.	Jehan	1	-1/+3
2022-12-20	script: improve a bit create-table.py and regenerate the Georgian charsets.	Jehan	3	-36/+53
2022-12-20	script, src, test: new Georgian support.	Jehan	15	-2/+779
2022-12-20	script: new create-table script.	Jehan	1	-0/+137
2022-12-20	script: update the README.	Jehan	1	-6/+5
2022-12-20	script, src, test: adding Catalan support.	Jehan	13	-2/+543
2022-12-19	src: new Big5 detection implementation.	Jehan	6	-1060/+118
2022-12-18	Issue #21: Greek CP737 support.	Jehan	8	-380/+489
2022-12-18	script: fix a notice message.	Jehan	1	-1/+1
2022-12-18	script: add a requirements.txt for our generation script.	Jehan	2	-0/+4
2022-12-18	script, src: generate more code for language and sequence model listing.	Jehan	46	-895/+1211
2022-12-17	README: missing UTF-8 support listed on several languages.	Jehan	1	-0/+4
2022-12-17	script, src, test: add Serbian support.	Jehan	14	-2/+601
2022-12-17	src, script: add Macedonian support.	Jehan	15	-2/+646
2022-12-17	script, src: regenerate Russian models and add UTF-8/Russian support.	Jehan	14	-276/+943
2022-12-17	script, src, test: add Ukrainian support.	Jehan	13	-3/+609
2022-12-17	script, src, test: adding Belarusian support.	Jehan	14	-2/+524
2022-12-17	script, src, test: Bulgarian language models added.	Jehan	13	-193/+700
2022-12-17	script: add an error handling for when iconv fail to convert from a codepoint.	Jehan	1	-0/+3
2022-12-16	test: adding 2 tests for Hebrew/IBM862 recognition.wip/Jehan/improved-API	Jehan	2	-0/+2
2022-12-16	Issue #22: Hebrew CP862 support.	Jehan	8	-544/+661
2022-12-16	test: add ability to have several tests per charsets.	Jehan	1	-1/+2
2022-12-15	test: no:utf-8 is actually working now, after the last model script fix…	Jehan	1	-2/+1
2022-12-15	src: all language models now rebuilt after the fix.	Jehan	61	-11347/+11254
2022-12-15	script: fix BuildLangModel.py.	Jehan	1	-4/+6
2022-12-14	test: finally add English/UTF-8 test file.	Jehan	1	-0/+1
2022-12-14	scripts: all language models rebuilt with the new ratio data.	Jehan	63	-8583/+11714
2022-12-14	script: model-building script updated to produce the 2 new ratios…	Jehan	1	-1/+26
2022-12-14	src: improve algorithm for confidence computation.	Jehan	2	-5/+31
2022-12-14	src: when checking for candidates, make sure we haven't any unprocessed…	Jehan	1	-1/+8
2022-12-14	script, src: rebuild the English model.	Jehan	2	-331/+302
2022-12-14	src: add a --language\|-l option to the uchardet CLI tool.	Jehan	1	-9/+30
2022-12-14	src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.	Jehan	5	-15/+15
2022-12-14	test: temporarily disable the Norwegian/UTF-8 test.	Jehan	1	-1/+2
2022-12-14	src: process pending language data when we are going to pass buffer size.	Jehan	1	-0/+11
2022-12-14	script, src: rebuild the Danish model.	Jehan	4	-223/+341
2022-12-14	script, src: update Norwegian model with the new language features.	Jehan	6	-181/+352
2022-12-14	script: further fixing BuildLangModel.py.	Jehan	1	-0/+2
2022-12-14	script: improve a bit the management of use_ascii option.	Jehan	1	-7/+5
2022-12-14	script: work around recent issue of python wikipedia module.	Jehan	1	-3/+3
2022-12-14	test: improve test error output even more.	Jehan	1	-8/+61
2022-12-14	test: add stderr logging when a test fails.	Jehan	1	-0/+7
2022-12-14	script, src: add English language model.	Jehan	10	-2/+545
2022-12-14	src: drop less of UTF-8 confidence even with few non-multibyte chars.	Jehan	1	-2/+3