summaryrefslogtreecommitdiff
AgeCommit message (Expand)AuthorFilesLines
2021-03-22gitlab-ci: make test on Windows too.wip/Jehan/improved-API-make-test-win64Jehan1-1/+2
2021-03-22test: fix test binary build for Windows.Jehan1-2/+9
2021-03-22src: reset shortcut charset/language on Reset().Jehan1-0/+8
2021-03-22src: do not test with nsLatin1Prober anymore.Jehan1-2/+9
2021-03-22src: improve confidence computation (generic and single-byte charset).Jehan3-26/+31
2021-03-22script: generate more complete frequent characters when range is set.Jehan1-19/+16
2021-03-22script, src: regenerate the Thai model.Jehan3-288/+325
2021-03-21src, script: fix the order of characters for Vietnamese.Jehan2-376/+356
2021-03-21src, script: add concept of alphabet_mapping in language models.Jehan4-237/+192
2021-03-21script: regenerate Slovak and Slovene with better alphabet support.Jehan6-558/+587
2021-03-21script: fix a stupid bug making same ratio for all frequent characters.Jehan1-1/+1
2021-03-21script, src: regenerate the Vietnamese model.Jehan3-229/+383
2021-03-20src: fix negative confidence wrapping around because of unsigned int.Jehan1-1/+1
2021-03-20script, src: remove generated statistics data for Korean.Jehan5-1315/+2
2021-03-20src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.Jehan4-1/+313
2021-03-19README: fix a duplicate.Jehan1-1/+1
2021-03-19Update README.Jehan1-20/+105
2021-03-19src: consider any combination with a non-frequent character as sequence.Jehan1-0/+10
2021-03-19src: add Hindi/UTF-8 support.Jehan8-2/+501
2021-03-19src: improve confidence computation.Jehan2-26/+108
2021-03-19script: fix a bit BuildLangModel.py when use_ascii is True.Jehan1-3/+8
2021-03-19script, src: add generic Korean model.Jehan8-41/+2223
2021-03-18src, test: fix the new Johab prober and add a test.Jehan4-8/+15
2021-03-17src: build new charset prober for Johab Korean.Jehan6-6/+8
2021-03-17add charset prober for Johab KoreanLSY9-2/+1029
2021-03-17script, src: generate the Hebrew models.Jehan10-172/+642
2021-03-17test: 4 new tests for UTF-8.Jehan4-0/+8
2021-03-17src: drop the SURE_YES confidence for character distribution probers.Jehan1-1/+1
2021-03-17src: do not shortcut UTF-8 detection too early.Jehan1-1/+3
2021-03-17src: nsEscCharsetProber also returns the correct language.Jehan6-6/+21
2021-03-17src: make nsMBCSGroupProber report all valid candidates.Jehan4-99/+202
2021-03-17src: allow for nsCharSetProber to return several candidates.Jehan27-96/+110
2021-03-17src: nsMBCSGroupProber confidence weighed by language confidence.Jehan1-2/+16
2021-03-17src: tweak again the language detection confidence.Jehan1-13/+9
2021-03-17test: update unit test to check detected languages.Jehan1-21/+44
2021-03-17src: reset language detectors when resetting a nsMBCSGroupProber.Jehan1-0/+6
2021-03-17src, script: regenerate all existing language models.Jehan43-4708/+5426
2021-03-16Using the generic language detector in UTF-8 detection.Jehan29-42/+234
2021-03-16New generic language detector class.Jehan3-0/+300
2021-03-16Rebuild a bunch of language models.Jehan14-1271/+1617
2021-03-14src: add a --weight option to the CLI tool.Jehan1-13/+72
2021-03-14src: new weight concept in the C API.Jehan3-4/+86
2021-03-14src: fix the usage of `uchardet` tool.Jehan1-1/+1
2021-03-14src: `uchardet` tool now shows the language code in verbose mode.Jehan1-3/+9
2021-03-14script: update BuildLangModel.py to updated SequenceModel struct.Jehan1-1/+2
2021-03-14src: new API to get the detected language.Jehan51-104/+276
2021-03-14test: fix test script to use the new API and get rid of build warning.Jehan1-1/+1
2021-03-14src: new option --verbose|-V in the `uchardet` CLI tool.Jehan1-10/+38
2021-03-14src: new API to get all candidates and their confidence.Jehan3-3/+51
2021-03-14src: now reporting encoding+confidence and keeping a list.Jehan3-26/+62