summaryrefslogtreecommitdiff
AgeCommit message (Expand)AuthorFilesLines
2022-12-16test: adding 2 tests for Hebrew/IBM862 recognition.wip/Jehan/improved-APIJehan2-0/+2
2022-12-16Issue #22: Hebrew CP862 support.Jehan8-544/+661
2022-12-16test: add ability to have several tests per charsets.Jehan1-1/+2
2022-12-15test: no:utf-8 is actually working now, after the last model script fix…Jehan1-2/+1
2022-12-15src: all language models now rebuilt after the fix.Jehan61-11347/+11254
2022-12-15script: fix BuildLangModel.py.Jehan1-4/+6
2022-12-14test: finally add English/UTF-8 test file.Jehan1-0/+1
2022-12-14scripts: all language models rebuilt with the new ratio data.Jehan63-8583/+11714
2022-12-14script: model-building script updated to produce the 2 new ratios…Jehan1-1/+26
2022-12-14src: improve algorithm for confidence computation.Jehan2-5/+31
2022-12-14src: when checking for candidates, make sure we haven't any unprocessed…Jehan1-1/+8
2022-12-14script, src: rebuild the English model.Jehan2-331/+302
2022-12-14src: add a --language|-l option to the uchardet CLI tool.Jehan1-9/+30
2022-12-14src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.Jehan5-15/+15
2022-12-14test: temporarily disable the Norwegian/UTF-8 test.Jehan1-1/+2
2022-12-14src: process pending language data when we are going to pass buffer size.Jehan1-0/+11
2022-12-14script, src: rebuild the Danish model.Jehan4-223/+341
2022-12-14script, src: update Norwegian model with the new language features.Jehan6-181/+352
2022-12-14script: further fixing BuildLangModel.py.Jehan1-0/+2
2022-12-14script: improve a bit the management of use_ascii option.Jehan1-7/+5
2022-12-14script: work around recent issue of python wikipedia module.Jehan1-3/+3
2022-12-14test: improve test error output even more.Jehan1-8/+61
2022-12-14test: add stderr logging when a test fails.Jehan1-0/+7
2022-12-14script, src: add English language model.Jehan10-2/+545
2022-12-14src: drop less of UTF-8 confidence even with few non-multibyte chars.Jehan1-2/+3
2022-12-14test: fix test binary build for Windows.Jehan1-2/+9
2022-12-14src: reset shortcut charset/language on Reset().Jehan1-0/+8
2022-12-14src: do not test with nsLatin1Prober anymore.Jehan1-2/+9
2022-12-14src: improve confidence computation (generic and single-byte charset).Jehan3-26/+31
2022-12-14script: generate more complete frequent characters when range is set.Jehan1-19/+16
2022-12-14script, src: regenerate the Thai model.Jehan3-288/+325
2022-12-14src, script: fix the order of characters for Vietnamese.Jehan2-376/+356
2022-12-14src, script: add concept of alphabet_mapping in language models.Jehan4-237/+192
2022-12-14script: regenerate Slovak and Slovene with better alphabet support.Jehan6-558/+587
2022-12-14script: fix a stupid bug making same ratio for all frequent characters.Jehan1-1/+1
2022-12-14script, src: regenerate the Vietnamese model.Jehan3-229/+383
2022-12-14src: fix negative confidence wrapping around because of unsigned int.Jehan1-1/+1
2022-12-14script, src: remove generated statistics data for Korean.Jehan5-1315/+2
2022-12-14src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.Jehan4-1/+313
2022-12-14README: fix a duplicate.Jehan1-1/+1
2022-12-14Update README.Jehan1-20/+105
2022-12-14src: consider any combination with a non-frequent character as sequence.Jehan1-0/+10
2022-12-14src: add Hindi/UTF-8 support.Jehan8-2/+501
2022-12-14src: improve confidence computation.Jehan2-26/+108
2022-12-14script: fix a bit BuildLangModel.py when use_ascii is True.Jehan1-3/+8
2022-12-14script, src: add generic Korean model.Jehan8-41/+2223
2022-12-14src, test: fix the new Johab prober and add a test.Jehan4-8/+15
2022-12-14src: build new charset prober for Johab Korean.Jehan6-6/+8
2022-12-14add charset prober for Johab KoreanLSY9-2/+1029
2022-12-14script, src: generate the Hebrew models.Jehan10-172/+642