diff options
author | Jehan <jehan@girinstud.io> | 2022-12-20 01:56:24 +0100 |
---|---|---|
committer | Jehan <jehan@girinstud.io> | 2022-12-20 01:56:24 +0100 |
commit | 419a971e6a9de966ea3a0b255bfd598a7617dc59 (patch) | |
tree | b1d34df98702c86aa3e13b36ef00af8cb10646d3 | |
parent | d40e5868d5ec1f08f1e6e0d25e04dae68c586ba1 (diff) |
script: update the README.
-rw-r--r-- | script/README | 11 |
1 files changed, 5 insertions, 6 deletions
diff --git a/script/README b/script/README index 2b19c26..0c497fe 100644 --- a/script/README +++ b/script/README @@ -16,7 +16,7 @@ to recognize French text encoded in ISO-8859-15, but may fail at detecting ISO-8859-15 for non-supported languages. This is why, though less flexible, it also makes uchardet much more -accurate than other detection system, as well as making it an efficient +accurate than other detection systems, as well as making it an efficient language recognition system. Since many single-byte charsets actually share the same layout (or very similar ones), it is actually impossible to have an accurate single-byte @@ -47,7 +47,7 @@ can just run `pip3 install -r requirements.txt`. Let's say you added (or modified) support for French (`fr`), run: -> ./BuildLangModel.py fr --max-page=100 --max-depth=4 +> ./BuildLangModel.py fr --max-page=200 --max-depth=4 The options can be changed to any value. Bigger values mean the script will process more data, so more processing time now, but uchardet may @@ -55,12 +55,11 @@ possibly be more accurate in the end. ## Updating core code ## -If you were only updating data for a language model, you have nothing +If you were only updating data for an existing language model, you have nothing else to do. Just build `uchardet` again and test it. -If you were creating new models though, you will have to add these in -src/nsSBCSGroupProber.cpp and src/nsSBCharSetProber.h, and increase the -value of `NUM_OF_SBCS_PROBERS` in src/nsSBCSGroupProber.h. +If you were creating new models though, you will have to add the sequence models +in src/nsSBCSGroupProber.cpp and the language model in src/nsMBCSGroupProber.cpp. Finally add the new file in src/CMakeLists.txt. I will be looking to make this step more straightforward in the future. |