summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJehan <jehan@girinstud.io>2022-12-20 01:56:24 +0100
committerJehan <jehan@girinstud.io>2022-12-20 01:56:24 +0100
commit419a971e6a9de966ea3a0b255bfd598a7617dc59 (patch)
treeb1d34df98702c86aa3e13b36ef00af8cb10646d3
parentd40e5868d5ec1f08f1e6e0d25e04dae68c586ba1 (diff)
script: update the README.
-rw-r--r--script/README11
1 files changed, 5 insertions, 6 deletions
diff --git a/script/README b/script/README
index 2b19c26..0c497fe 100644
--- a/script/README
+++ b/script/README
@@ -16,7 +16,7 @@ to recognize French text encoded in ISO-8859-15, but may fail at
detecting ISO-8859-15 for non-supported languages.
This is why, though less flexible, it also makes uchardet much more
-accurate than other detection system, as well as making it an efficient
+accurate than other detection systems, as well as making it an efficient
language recognition system.
Since many single-byte charsets actually share the same layout (or very
similar ones), it is actually impossible to have an accurate single-byte
@@ -47,7 +47,7 @@ can just run `pip3 install -r requirements.txt`.
Let's say you added (or modified) support for French (`fr`), run:
-> ./BuildLangModel.py fr --max-page=100 --max-depth=4
+> ./BuildLangModel.py fr --max-page=200 --max-depth=4
The options can be changed to any value. Bigger values mean the script
will process more data, so more processing time now, but uchardet may
@@ -55,12 +55,11 @@ possibly be more accurate in the end.
## Updating core code ##
-If you were only updating data for a language model, you have nothing
+If you were only updating data for an existing language model, you have nothing
else to do. Just build `uchardet` again and test it.
-If you were creating new models though, you will have to add these in
-src/nsSBCSGroupProber.cpp and src/nsSBCharSetProber.h, and increase the
-value of `NUM_OF_SBCS_PROBERS` in src/nsSBCSGroupProber.h.
+If you were creating new models though, you will have to add the sequence models
+in src/nsSBCSGroupProber.cpp and the language model in src/nsMBCSGroupProber.cpp.
Finally add the new file in src/CMakeLists.txt.
I will be looking to make this step more straightforward in the future.