diff options
-rw-r--r-- | langclass/ShortTexts/README | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/langclass/ShortTexts/README b/langclass/ShortTexts/README new file mode 100644 index 0000000..821b157 --- /dev/null +++ b/langclass/ShortTexts/README @@ -0,0 +1,32 @@ +nr: In order to try and distinguish xh, zu and nr from eachother, temporarily set + MAXNGRAMS 2000 + MAXNGRAMSYMBOL 4 + while generating those three fingerprints as a bit of a bodge +shs: No UDHR translation available. Nor any lengthy cohesive text. So sample + text is cut and pasted phrases from http://www.firstvoices.ca/en/Secwepemc/phrase-book + +remaining languages with LibreOffice support missing fingerprints: + +sd-IN: The UDHR for Sindhi is basically a picture of the text, so can't extract it. + +these are just a little confusing with similar languages, just needs to be unpicked +sdc-IT +sdn-IT +src-IT +sro-IT + +ku-IQ +ku-IR +ku-SY +ku-TR + +these are trickier +sat-IN +sma-SE +smj-NO +smj-SE +smn-FI +sms-FI +sjd-RU +rue-SK +rue-UA |