summaryrefslogtreecommitdiff
path: root/langclass
diff options
context:
space:
mode:
authorCaolán McNamara <caolanm@redhat.com>2012-05-28 10:16:20 +0100
committerCaolán McNamara <caolanm@redhat.com>2012-05-28 10:16:20 +0100
commit0362d22b26eb154b85c891eeb4d776c986cf0ae9 (patch)
tree622de8fa63d6150621a131f6d0acf15d1b6f9cc4 /langclass
parent66b709bbedbcc8d55da4e83739d95278082aaa68 (diff)
add a README to track missing but wanted languages
Diffstat (limited to 'langclass')
-rw-r--r--langclass/ShortTexts/README32
1 files changed, 32 insertions, 0 deletions
diff --git a/langclass/ShortTexts/README b/langclass/ShortTexts/README
new file mode 100644
index 0000000..821b157
--- /dev/null
+++ b/langclass/ShortTexts/README
@@ -0,0 +1,32 @@
+nr: In order to try and distinguish xh, zu and nr from eachother, temporarily set
+ MAXNGRAMS 2000
+ MAXNGRAMSYMBOL 4
+ while generating those three fingerprints as a bit of a bodge
+shs: No UDHR translation available. Nor any lengthy cohesive text. So sample
+ text is cut and pasted phrases from http://www.firstvoices.ca/en/Secwepemc/phrase-book
+
+remaining languages with LibreOffice support missing fingerprints:
+
+sd-IN: The UDHR for Sindhi is basically a picture of the text, so can't extract it.
+
+these are just a little confusing with similar languages, just needs to be unpicked
+sdc-IT
+sdn-IT
+src-IT
+sro-IT
+
+ku-IQ
+ku-IR
+ku-SY
+ku-TR
+
+these are trickier
+sat-IN
+sma-SE
+smj-NO
+smj-SE
+smn-FI
+sms-FI
+sjd-RU
+rue-SK
+rue-UA