1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
|
This is is FriBidi, a Free Implementation of the Unicode BiDi algorithm.
Background
==========
One of the missing links stopping the penetration of free software in
Israel is the lack of support for Hebrew. In order to have proper
Hebrew support, the BiDi algorithm must be implemented. It is my hope
that this library will stimulate more Hebrew free software.
Of course the BiDi algorithm is not limited to Hebrew, so I expect
that our Arab neighbors will also find this software useful.
Audience
========
It is my hope that this library will stimulate the implementation of
Hebrew and Arabic in lots of free software. Here is a small list of
projects that would benifit from the use of the FriBidi library, but
of course there are many more: Wine, Mozilla, Gtk, Gnome, Qt, KDE,
AbiWord, lynx, StarOffice.
Downloading
===========
The latest version of FriBidi may be found at:
http://fribidi.sourceforge.net/
Building
========
See INSTALL for a description of how to build and install this library.
Implementation
==============
The library implements most of the algorithm as described in the
"Unicode Bidirectional Algorithm, Working Draft Unicode Technical Report
#9, http://www.unicode.org/unicode/reports/tr9/". The major feature
that is currently missing in fribidi is the support for explicit overrides.
In the API I was was inspired by the document "Bi-Di languages support
- BiDi API propasal", http://www.langbox.com/AraMosaic/mozilla/BiDi_API.html,
by Franck Portaneri <franck@langbox.com> which he wrote as a proposal
for adding BiDi support to Mozilla.
Internally the library uses Unicode entirely. The character property
function was automatically created from the Unicode property list
document PropList.txt available from the Unicode ftp site. This
means that every Unicode character should be treated in strict
accordance with the Unicode specification. The same is true for the
mirroring of characters, which also works for all the characters
listed as mirrorable in the the Unicode specification.
Other character sets must be converted into Unicode before the library
may be used. In order to use e.g. iso8859-8, the function
void
fribidi_iso8859_8_to_unicode(guchar *s,
/* output */
FriBidiChar *us)
must be called which translates the guchar string *s to a unicode
string. There is also a corresponding fribidi_unicode_to_iso8859_8
that may be called to convert the string back to iso8859_8 for output.
The reordering of characters is done through the function:
void
fribidi_log2vis(/* input */
FriBidiChar *str,
int len,
FriBidiCharType *pbase_dir,
/* output */
FriBidiChar *visual_str,
gint *position_L_to_V_list,
gint *position_V_to_L_list,
gint8 *embedding_level_list
)
where
str is the Unicode input string
len is the length of the unicode string
pbase_dir is the input and output base direction. If
base == FRIBIDI_TYPE_N then fribidi_log2vis
calculates the base direction on its own
according to the BiDi algorithm.
visual_str The reordered output unicode string.
position_L_to_V_list Maps the positions in the logical string to
positions in the visual string.
position_V_to_L_list Maps the positions in the visual string to
the positions in the logical string.
embedding_level_list Returns the classification of each character. Even
levels indicate LTR characters, and odd levels
indicate RTL characters. The main use of this
list is in interactive applications when the
embedding e.g. level determines cursor display.
In any of the output pointers == NULL, then that information is not
calculated.
A test program test_fribidi has been written to test out the algorithm.
test_fribidi currently works on iso8859-8 by default, but by adding
the flag -capital_rtl it treats capital letters as RTL, as is done
for illustration purposes in the Unicode specification.
How it looks like
=================
Here is the output of
./test_fribidi -capital_rtl tests/test-capital-rtl
car is THE CAR in arabic => car is RAC EHT in arabic
CAR IS the car IN ENGLISH => HSILGNE NI the car SI RAC
he said "IT IS 123, 456, OK" => he said "KO ,456 ,123 SI TI"
he said "IT IS (123, 456), OK" => he said "KO ,(456 ,123) SI TI"
he said "IT IS 123,456, OK" => he said "KO ,123,456 SI TI"
he said "IT IS (123,456), OK" => he said "KO ,(123,456) SI TI"
HE SAID "it is 123, 456, ok" => "it is 123, 456, ok" DIAS EH
<H123>shalom</H123> => <123H/>shalom<123H>
<h123>SAALAM</h123> => <h123>MALAAS</h123>
HE SAID "it is a car!" AND RAN => NAR DNA "!it is a car" DIAS EH
HE SAID "it is a car!x" AND RAN => NAR DNA "it is a car!x" DIAS EH
Executable
==========
There is a also a command line utilitity called fribidi that loops over
the text of a file and performs the BiDi algorithm on each line. Run
fribidi with the help option to learn its usage.
Bugs and comments
=================
Report FriBidi bugs at:
http://sourceforge.net/bugs/?group_id=2722
And send your comments to:
fribidi-discuss@lists.sourceforge.net
|