summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Expand)AuthorFilesLines
2023-07-17src: handle long sequences of characters.Jehan1-10/+21
2023-07-17Issue #33: crafted sequence of bytes triggers memory write past the bounds of…Jehan1-2/+13
2023-07-17src: fix mismatched new [] / delete.Jehan1-2/+2
2023-07-17Issue #32: Global buffer read overflow in `GetOrderFromCodePoint`.Jehan1-13/+8
2022-12-20script, src, test: new Georgian support.Jehan6-2/+299
2022-12-20script, src, test: adding Catalan support.Jehan6-2/+218
2022-12-19src: new Big5 detection implementation.Jehan6-1060/+118
2022-12-18Issue #21: Greek CP737 support.Jehan3-152/+196
2022-12-18script, src: generate more code for language and sequence model listing.Jehan43-202/+375
2022-12-17script, src, test: add Serbian support.Jehan8-2/+285
2022-12-17src, script: add Macedonian support.Jehan8-2/+331
2022-12-17script, src: regenerate Russian models and add UTF-8/Russian support.Jehan6-276/+320
2022-12-17script, src, test: add Ukrainian support.Jehan8-3/+267
2022-12-17script, src, test: adding Belarusian support.Jehan8-2/+213
2022-12-17script, src, test: Bulgarian language models added.Jehan6-193/+225
2022-12-16Issue #22: Hebrew CP862 support.Jehan4-279/+326
2022-12-15src: all language models now rebuilt after the fix.Jehan30-3329/+3272
2022-12-14scripts: all language models rebuilt with the new ratio data.Jehan30-3298/+3461
2022-12-14src: improve algorithm for confidence computation.Jehan2-5/+31
2022-12-14src: when checking for candidates, make sure we haven't any unprocessed…Jehan1-1/+8
2022-12-14script, src: rebuild the English model.Jehan1-167/+67
2022-12-14src: add a --language|-l option to the uchardet CLI tool.Jehan1-9/+30
2022-12-14src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.Jehan4-14/+14
2022-12-14src: process pending language data when we are going to pass buffer size.Jehan1-0/+11
2022-12-14script, src: rebuild the Danish model.Jehan3-84/+118
2022-12-14script, src: update Norwegian model with the new language features.Jehan4-180/+117
2022-12-14script, src: add English language model.Jehan8-2/+300
2022-12-14src: drop less of UTF-8 confidence even with few non-multibyte chars.Jehan1-2/+3
2022-12-14src: reset shortcut charset/language on Reset().Jehan1-0/+8
2022-12-14src: do not test with nsLatin1Prober anymore.Jehan1-2/+9
2022-12-14src: improve confidence computation (generic and single-byte charset).Jehan3-26/+31
2022-12-14script, src: regenerate the Thai model.Jehan1-169/+194
2022-12-14src, script: fix the order of characters for Vietnamese.Jehan1-266/+252
2022-12-14src, script: add concept of alphabet_mapping in language models.Jehan1-101/+105
2022-12-14script: regenerate Slovak and Slovene with better alphabet support.Jehan2-283/+287
2022-12-14script, src: regenerate the Vietnamese model.Jehan1-159/+266
2022-12-14src: fix negative confidence wrapping around because of unsigned int.Jehan1-1/+1
2022-12-14script, src: remove generated statistics data for Korean.Jehan4-1315/+0
2022-12-14src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition.Jehan4-1/+313
2022-12-14src: consider any combination with a non-frequent character as sequence.Jehan1-0/+10
2022-12-14src: add Hindi/UTF-8 support.Jehan5-2/+233
2022-12-14src: improve confidence computation.Jehan2-26/+108
2022-12-14script, src: add generic Korean model.Jehan5-1/+1316
2022-12-14src, test: fix the new Johab prober and add a test.Jehan3-8/+14
2022-12-14src: build new charset prober for Johab Korean.Jehan6-6/+8
2022-12-14add charset prober for Johab KoreanLSY9-2/+1029
2022-12-14script, src: generate the Hebrew models.Jehan6-172/+245
2022-12-14src: drop the SURE_YES confidence for character distribution probers.Jehan1-1/+1
2022-12-14src: do not shortcut UTF-8 detection too early.Jehan1-1/+3
2022-12-14src: nsEscCharsetProber also returns the correct language.Jehan6-6/+21