Issue #22: Hebrew CP862 support.

Added in both visual and logical order since Wikipedia says: > Hebrew text encoded using code page 862 was usually stored in visual > order; nevertheless, a few DOS applications, notably a word processor > named EinsteinWriter, stored Hebrew in logical order. I am not using the nsHebrewProber wrapper (nameProber) for this new support, because I am really unsure this is of any use. Our statistical code based on letter and sequence usage should be more than enough to detect both variants of Hebrew encoding already, and my testing show that so far (with pretty outstanding score on actual Hebrew tests while all the other probers return bad scores). This will have to be studied a bit more later and maybe the whole nsHebrewProber might be deleted, even for Windows-1255 charset. I'm also cleaning a bit nsSBCSGroupProber::nsSBCSGroupProber() code by incrementing a single index, instead of maintaining the indexes by hand (otherwise each time we add probers in the middle, to keep them logically gathered by languages, we have to manually increment dozens of following probers).
author: Jehan <jehan@girinstud.io> 2022-12-16 23:17:47 +0100
committer: Jehan <jehan@girinstud.io> 2022-12-16 23:27:52 +0100
commit: 0974920bddfbb1eb13a8d84aa1acd96822d9bf33 (patch)
tree: e2e5d70aaa092af015f94ee601d4d2ea777e664f /README.md
parent: 127d7faf478d62533f2ead1e8df5f2d7a6276da1 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/README.md b/README.md
index b01494b..288e0b3 100644
--- a/README.md
+++ b/README.md
@@ -83,6 +83,7 @@ uchardet started as a C language binding of the original C++ implementation of t
     * UTF-8
     * ISO-8859-8
     * WINDOWS-1255
+    * IBM862
   * Hindi
     * UTF-8
   * Hungarian:
author	Jehan <jehan@girinstud.io>	2022-12-16 23:17:47 +0100
committer	Jehan <jehan@girinstud.io>	2022-12-16 23:27:52 +0100
commit	0974920bddfbb1eb13a8d84aa1acd96822d9bf33 (patch)
tree	e2e5d70aaa092af015f94ee601d4d2ea777e664f /README.md
parent	127d7faf478d62533f2ead1e8df5f2d7a6276da1 (diff)