summaryrefslogtreecommitdiff
AgeCommit message (Expand)AuthorFilesLines
2023-11-15gitlab-ci: CI is now forbidden on MR run by passing-by contributors.HEADmasterJehan1-0/+7
2023-11-15Add notepad++ to readmeJaroslav Lobačevski1-0/+1
2023-07-17src: handle long sequences of characters.Jehan1-10/+21
2023-07-17Issue #33: crafted sequence of bytes triggers memory write past the bounds of…Jehan1-2/+13
2023-07-17src: fix mismatched new [] / delete.Jehan1-2/+2
2023-07-17Issue #32: Global buffer read overflow in `GetOrderFromCodePoint`.Jehan1-13/+8
2023-07-17CMake: enable ASAN in Debug builds.Jehan1-1/+3
2022-12-20script: improve a bit create-table.py and regenerate the Georgian charsets.Jehan3-36/+53
2022-12-20script, src, test: new Georgian support.Jehan15-2/+779
2022-12-20script: new create-table script.Jehan1-0/+137
2022-12-20script: update the README.Jehan1-6/+5
2022-12-20script, src, test: adding Catalan support.Jehan13-2/+543
2022-12-19src: new Big5 detection implementation.Jehan6-1060/+118
2022-12-18Issue #21: Greek CP737 support.Jehan8-380/+489
2022-12-18script: fix a notice message.Jehan1-1/+1
2022-12-18script: add a requirements.txt for our generation script.Jehan2-0/+4
2022-12-18script, src: generate more code for language and sequence model listing.Jehan46-895/+1211
2022-12-17README: missing UTF-8 support listed on several languages.Jehan1-0/+4
2022-12-17script, src, test: add Serbian support.Jehan14-2/+601
2022-12-17src, script: add Macedonian support.Jehan15-2/+646
2022-12-17script, src: regenerate Russian models and add UTF-8/Russian support.Jehan14-276/+943
2022-12-17script, src, test: add Ukrainian support.Jehan13-3/+609
2022-12-17script, src, test: adding Belarusian support.Jehan14-2/+524
2022-12-17script, src, test: Bulgarian language models added.Jehan13-193/+700
2022-12-17script: add an error handling for when iconv fail to convert from a codepoint.Jehan1-0/+3
2022-12-16test: adding 2 tests for Hebrew/IBM862 recognition.wip/Jehan/improved-APIJehan2-0/+2
2022-12-16Issue #22: Hebrew CP862 support.Jehan8-544/+661
2022-12-16test: add ability to have several tests per charsets.Jehan1-1/+2
2022-12-15test: no:utf-8 is actually working now, after the last model script fix…Jehan1-2/+1
2022-12-15src: all language models now rebuilt after the fix.Jehan61-11347/+11254
2022-12-15script: fix BuildLangModel.py.Jehan1-4/+6
2022-12-14test: finally add English/UTF-8 test file.Jehan1-0/+1
2022-12-14scripts: all language models rebuilt with the new ratio data.Jehan63-8583/+11714
2022-12-14script: model-building script updated to produce the 2 new ratios…Jehan1-1/+26
2022-12-14src: improve algorithm for confidence computation.Jehan2-5/+31
2022-12-14src: when checking for candidates, make sure we haven't any unprocessed…Jehan1-1/+8
2022-12-14script, src: rebuild the English model.Jehan2-331/+302
2022-12-14src: add a --language|-l option to the uchardet CLI tool.Jehan1-9/+30
2022-12-14src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.Jehan5-15/+15
2022-12-14test: temporarily disable the Norwegian/UTF-8 test.Jehan1-1/+2
2022-12-14src: process pending language data when we are going to pass buffer size.Jehan1-0/+11
2022-12-14script, src: rebuild the Danish model.Jehan4-223/+341
2022-12-14script, src: update Norwegian model with the new language features.Jehan6-181/+352
2022-12-14script: further fixing BuildLangModel.py.Jehan1-0/+2
2022-12-14script: improve a bit the management of use_ascii option.Jehan1-7/+5
2022-12-14script: work around recent issue of python wikipedia module.Jehan1-3/+3
2022-12-14test: improve test error output even more.Jehan1-8/+61
2022-12-14test: add stderr logging when a test fails.Jehan1-0/+7
2022-12-14script, src: add English language model.Jehan10-2/+545
2022-12-14src: drop less of UTF-8 confidence even with few non-multibyte chars.Jehan1-2/+3