diff options
author | Caolán McNamara <caolanm@redhat.com> | 2012-05-28 10:16:20 +0100 |
---|---|---|
committer | Caolán McNamara <caolanm@redhat.com> | 2012-05-28 10:16:20 +0100 |
commit | 0362d22b26eb154b85c891eeb4d776c986cf0ae9 (patch) | |
tree | 622de8fa63d6150621a131f6d0acf15d1b6f9cc4 /langclass | |
parent | 66b709bbedbcc8d55da4e83739d95278082aaa68 (diff) |
add a README to track missing but wanted languages
Diffstat (limited to 'langclass')
-rw-r--r-- | langclass/ShortTexts/README | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/langclass/ShortTexts/README b/langclass/ShortTexts/README new file mode 100644 index 0000000..821b157 --- /dev/null +++ b/langclass/ShortTexts/README @@ -0,0 +1,32 @@ +nr: In order to try and distinguish xh, zu and nr from eachother, temporarily set + MAXNGRAMS 2000 + MAXNGRAMSYMBOL 4 + while generating those three fingerprints as a bit of a bodge +shs: No UDHR translation available. Nor any lengthy cohesive text. So sample + text is cut and pasted phrases from http://www.firstvoices.ca/en/Secwepemc/phrase-book + +remaining languages with LibreOffice support missing fingerprints: + +sd-IN: The UDHR for Sindhi is basically a picture of the text, so can't extract it. + +these are just a little confusing with similar languages, just needs to be unpicked +sdc-IT +sdn-IT +src-IT +sro-IT + +ku-IQ +ku-IR +ku-SY +ku-TR + +these are trickier +sat-IN +sma-SE +smj-NO +smj-SE +smn-FI +sms-FI +sjd-RU +rue-SK +rue-UA |