Age | Commit message (Expand) | Author | Files | Lines |
2023-07-17 | src: handle long sequences of characters. | Jehan | 1 | -10/+21 |
2023-07-17 | Issue #33: crafted sequence of bytes triggers memory write past the bounds of… | Jehan | 1 | -2/+13 |
2023-07-17 | src: fix mismatched new [] / delete. | Jehan | 1 | -2/+2 |
2023-07-17 | Issue #32: Global buffer read overflow in `GetOrderFromCodePoint`. | Jehan | 1 | -13/+8 |
2022-12-20 | script, src, test: new Georgian support. | Jehan | 6 | -2/+299 |
2022-12-20 | script, src, test: adding Catalan support. | Jehan | 6 | -2/+218 |
2022-12-19 | src: new Big5 detection implementation. | Jehan | 6 | -1060/+118 |
2022-12-18 | Issue #21: Greek CP737 support. | Jehan | 3 | -152/+196 |
2022-12-18 | script, src: generate more code for language and sequence model listing. | Jehan | 43 | -202/+375 |
2022-12-17 | script, src, test: add Serbian support. | Jehan | 8 | -2/+285 |
2022-12-17 | src, script: add Macedonian support. | Jehan | 8 | -2/+331 |
2022-12-17 | script, src: regenerate Russian models and add UTF-8/Russian support. | Jehan | 6 | -276/+320 |
2022-12-17 | script, src, test: add Ukrainian support. | Jehan | 8 | -3/+267 |
2022-12-17 | script, src, test: adding Belarusian support. | Jehan | 8 | -2/+213 |
2022-12-17 | script, src, test: Bulgarian language models added. | Jehan | 6 | -193/+225 |
2022-12-16 | Issue #22: Hebrew CP862 support. | Jehan | 4 | -279/+326 |
2022-12-15 | src: all language models now rebuilt after the fix. | Jehan | 30 | -3329/+3272 |
2022-12-14 | scripts: all language models rebuilt with the new ratio data. | Jehan | 30 | -3298/+3461 |
2022-12-14 | src: improve algorithm for confidence computation. | Jehan | 2 | -5/+31 |
2022-12-14 | src: when checking for candidates, make sure we haven't any unprocessed… | Jehan | 1 | -1/+8 |
2022-12-14 | script, src: rebuild the English model. | Jehan | 1 | -167/+67 |
2022-12-14 | src: add a --language|-l option to the uchardet CLI tool. | Jehan | 1 | -9/+30 |
2022-12-14 | src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. | Jehan | 4 | -14/+14 |
2022-12-14 | src: process pending language data when we are going to pass buffer size. | Jehan | 1 | -0/+11 |
2022-12-14 | script, src: rebuild the Danish model. | Jehan | 3 | -84/+118 |
2022-12-14 | script, src: update Norwegian model with the new language features. | Jehan | 4 | -180/+117 |
2022-12-14 | script, src: add English language model. | Jehan | 8 | -2/+300 |
2022-12-14 | src: drop less of UTF-8 confidence even with few non-multibyte chars. | Jehan | 1 | -2/+3 |
2022-12-14 | src: reset shortcut charset/language on Reset(). | Jehan | 1 | -0/+8 |
2022-12-14 | src: do not test with nsLatin1Prober anymore. | Jehan | 1 | -2/+9 |
2022-12-14 | src: improve confidence computation (generic and single-byte charset). | Jehan | 3 | -26/+31 |
2022-12-14 | script, src: regenerate the Thai model. | Jehan | 1 | -169/+194 |
2022-12-14 | src, script: fix the order of characters for Vietnamese. | Jehan | 1 | -266/+252 |
2022-12-14 | src, script: add concept of alphabet_mapping in language models. | Jehan | 1 | -101/+105 |
2022-12-14 | script: regenerate Slovak and Slovene with better alphabet support. | Jehan | 2 | -283/+287 |
2022-12-14 | script, src: regenerate the Vietnamese model. | Jehan | 1 | -159/+266 |
2022-12-14 | src: fix negative confidence wrapping around because of unsigned int. | Jehan | 1 | -1/+1 |
2022-12-14 | script, src: remove generated statistics data for Korean. | Jehan | 4 | -1315/+0 |
2022-12-14 | src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. | Jehan | 4 | -1/+313 |
2022-12-14 | src: consider any combination with a non-frequent character as sequence. | Jehan | 1 | -0/+10 |
2022-12-14 | src: add Hindi/UTF-8 support. | Jehan | 5 | -2/+233 |
2022-12-14 | src: improve confidence computation. | Jehan | 2 | -26/+108 |
2022-12-14 | script, src: add generic Korean model. | Jehan | 5 | -1/+1316 |
2022-12-14 | src, test: fix the new Johab prober and add a test. | Jehan | 3 | -8/+14 |
2022-12-14 | src: build new charset prober for Johab Korean. | Jehan | 6 | -6/+8 |
2022-12-14 | add charset prober for Johab Korean | LSY | 9 | -2/+1029 |
2022-12-14 | script, src: generate the Hebrew models. | Jehan | 6 | -172/+245 |
2022-12-14 | src: drop the SURE_YES confidence for character distribution probers. | Jehan | 1 | -1/+1 |
2022-12-14 | src: do not shortcut UTF-8 detection too early. | Jehan | 1 | -1/+3 |
2022-12-14 | src: nsEscCharsetProber also returns the correct language. | Jehan | 6 | -6/+21 |