summaryrefslogtreecommitdiff
AgeCommit message (Expand)AuthorFilesLines
2022-12-14test: fix test binary build for Windows.Jehan1-2/+9
2022-12-14src: reset shortcut charset/language on Reset().Jehan1-0/+8
2022-12-14src: do not test with nsLatin1Prober anymore.Jehan1-2/+9
2022-12-14src: improve confidence computation (generic and single-byte charset).Jehan3-26/+31
2022-12-14script: generate more complete frequent characters when range is set.Jehan1-19/+16
2022-12-14script, src: regenerate the Thai model.Jehan3-288/+325
2022-12-14src, script: fix the order of characters for Vietnamese.Jehan2-376/+356
2022-12-14src, script: add concept of alphabet_mapping in language models.Jehan4-237/+192
2022-12-14script: regenerate Slovak and Slovene with better alphabet support.Jehan6-558/+587
2022-12-14script: fix a stupid bug making same ratio for all frequent characters.Jehan1-1/+1
2022-12-14script, src: regenerate the Vietnamese model.Jehan3-229/+383
2022-12-14src: fix negative confidence wrapping around because of unsigned int.Jehan1-1/+1
2022-12-14script, src: remove generated statistics data for Korean.Jehan5-1315/+2
2022-12-14src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.Jehan4-1/+313
2022-12-14README: fix a duplicate.Jehan1-1/+1
2022-12-14Update README.Jehan1-20/+105
2022-12-14src: consider any combination with a non-frequent character as sequence.Jehan1-0/+10
2022-12-14src: add Hindi/UTF-8 support.Jehan8-2/+501
2022-12-14src: improve confidence computation.Jehan2-26/+108
2022-12-14script: fix a bit BuildLangModel.py when use_ascii is True.Jehan1-3/+8
2022-12-14script, src: add generic Korean model.Jehan8-41/+2223
2022-12-14src, test: fix the new Johab prober and add a test.Jehan4-8/+15
2022-12-14src: build new charset prober for Johab Korean.Jehan6-6/+8
2022-12-14add charset prober for Johab KoreanLSY9-2/+1029
2022-12-14script, src: generate the Hebrew models.Jehan10-172/+642
2022-12-14test: 4 new tests for UTF-8.Jehan4-0/+8
2022-12-14src: drop the SURE_YES confidence for character distribution probers.Jehan1-1/+1
2022-12-14src: do not shortcut UTF-8 detection too early.Jehan1-1/+3
2022-12-14src: nsEscCharsetProber also returns the correct language.Jehan6-6/+21
2022-12-14src: make nsMBCSGroupProber report all valid candidates.Jehan4-99/+202
2022-12-14src: allow for nsCharSetProber to return several candidates.Jehan27-96/+110
2022-12-14src: nsMBCSGroupProber confidence weighed by language confidence.Jehan1-2/+16
2022-12-14src: tweak again the language detection confidence.Jehan1-13/+9
2022-12-14test: update unit test to check detected languages.Jehan1-23/+43
2022-12-14src: reset language detectors when resetting a nsMBCSGroupProber.Jehan1-0/+6
2022-12-14src, script: regenerate all existing language models.Jehan43-4708/+5426
2022-12-14Using the generic language detector in UTF-8 detection.Jehan29-42/+234
2022-12-14New generic language detector class.Jehan3-0/+300
2022-12-14Rebuild a bunch of language models.Jehan14-1401/+1617
2022-12-14src: add a --weight option to the CLI tool.Jehan1-13/+72
2022-12-14src: new weight concept in the C API.Jehan3-4/+86
2022-12-14src: fix the usage of `uchardet` tool.Jehan1-1/+1
2022-12-14src: `uchardet` tool now shows the language code in verbose mode.Jehan1-3/+9
2022-12-14script: update BuildLangModel.py to updated SequenceModel struct.Jehan1-1/+2
2022-12-14src: new API to get the detected language.Jehan51-104/+276
2022-12-14test: fix test script to use the new API and get rid of build warning.Jehan1-1/+1
2022-12-14src: new option --verbose|-V in the `uchardet` CLI tool.Jehan1-10/+38
2022-12-14src: new API to get all candidates and their confidence.Jehan3-3/+51
2022-12-14src: now reporting encoding+confidence and keeping a list.Jehan3-26/+62
2022-12-08README, doc: some README and release procedure updates.Jehan2-9/+13