From 6d31689632b48947f65536444682a487cab722f6 Mon Sep 17 00:00:00 2001 From: Jehan Date: Fri, 16 Dec 2022 23:28:28 +0100 Subject: test: adding 2 tests for Hebrew/IBM862 recognition. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is the same text, taken from this Wikipedia page, which was today's page of honor on Wikipedia in Hebrew: https://he.wikipedia.org/wiki/שתי מסכתות על ממשל מדיני I put it in 2 variants, since IBM862 can be used in logical and visual variants. The visual variant is just about inverting orders of letters (per lines, while lines stay in proper order), so that's what I did. Though note that the English title quoted in the text should likely not have been reverted, but it doesn't matter too much since anyway these are off-Hebrew alphabet and would trigger bad sequence score, whichever their order. So I didn't bother fixing these. --- test/he/ibm862.logical.txt | 1 + test/he/ibm862.visual.txt | 1 + 2 files changed, 2 insertions(+) create mode 100644 test/he/ibm862.logical.txt create mode 100644 test/he/ibm862.visual.txt diff --git a/test/he/ibm862.logical.txt b/test/he/ibm862.logical.txt new file mode 100644 index 0000000..b22fa94 --- /dev/null +++ b/test/he/ibm862.logical.txt @@ -0,0 +1 @@ + (: Two Treatises of Government) - ' , -1689.[1] (). "". , . , . diff --git a/test/he/ibm862.visual.txt b/test/he/ibm862.visual.txt new file mode 100644 index 0000000..5ce09f3 --- /dev/null +++ b/test/he/ibm862.visual.txt @@ -0,0 +1 @@ +. , . , ."" .)( ]1[.9861- , \' - )tnemnrevoG fo sesitaerT owT :( -- cgit v1.2.3