summaryrefslogtreecommitdiff
path: root/docs/usermanual-what-is-harfbuzz.xml
blob: 0c01adae28780d78bd8b6f02d7367e52281b09ca (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
<chapter id="what-is-harfbuzz">
  <title>What is HarfBuzz?</title>
  <para>
    HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
    give HarfBuzz a font and a string containing a sequence of Unicode
    codepoints, HarfBuzz selects and positions the corresponding
    glyphs from the font, applying all of the necessary layout rules
    and font features. HarfBuzz then returns the string to you in the
    form that is correctly arranged for the language and writing
    system. 
  </para>
  <para>
    HarfBuzz can properly shape all of the world's major writing
    systems. It runs on all major operating systems and software
    platforms, and it supports all of the modern font formats in use
    today.
  </para>
  <section id="what-is-text-shaping">
    <title>What is text shaping?</title>
    <para>
      Text shaping is the process of translating a string of character
      codes (such as Unicode codepoints) into a properly arranged
      sequence of glyphs that can be rendered onto a screen or into
      final output form for inclusion in a document.
    </para>
    <para>
      The shaping process is dependent on the input string, the active
      font, the script (or writing system) that the string is in, and
      the language that the string is in.
    </para>
    <para>
      Modern software systems generally only deal with strings in the
      Unicode encoding scheme (although legacy systems and documents may
      involve other encodings).
    </para>
    <para>
      There are several font formats that a program might
      encounter, each of which has a set of standard text-shaping
      rules.
    </para>
    <para>The dominant format is <ulink
      url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
    OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
    various scripts from around the world. These shaping models depend on
    the font including certain features in its <literal>GSUB</literal>
    and <literal>GPOS</literal> tables.
    </para>
    <para>
      Alternatively, OpenType fonts can include shaping features for
      the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
    </para>
    <para>
      TrueType fonts can also include OpenType shaping
      features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
      Advanced Typography</ulink> (AAT) tables to implement shaping
      support. AAT fonts are generally only found on macOS and iOS systems.
    </para>
    <para>
      Text strings will usually be tagged with a script and language
      tag that provide the context needed to perform text shaping
      correctly.  The necessary <ulink
      url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">Script</ulink> 
      and <ulink
      url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
      tags are defined by OpenType.
    </para>
  </section>
  
  <section id="why-do-i-need-a-shaping-engine">
    <title>Why do I need a shaping engine?</title>
    <para>
      Text shaping is an integral part of preparing text for
      display. Before a Unicode sequence can be rendered, the
      codepoints in the sequence must be mapped to the corresponding
      glyphs provided in the font, and those glyphs must be positioned
      correctly relative to each other. For many of the scripts
      supported in Unicode, these steps involve script-specific layout
      rules, including complex joining, reordering, and positioning
      behavior. Implementing these rules is the job of the shaping engine.
    </para>
    <para>
      Text shaping is a fairly low-level operation. HarfBuzz is
      used directly by text-handling libraries like <ulink
      url="https://www.pango.org/">Pango</ulink>, as well as by the layout
      engines in Firefox, LibreOffice, and Chromium. Unless you are
      <emphasis>writing</emphasis> one of these layout engines
      yourself, you will probably not need to use HarfBuzz: normally,
      lower-level libraries will turn text into glyphs for you.
    </para>
    <para>
      However, if you <emphasis>are</emphasis> writing a layout engine
      or graphics library yourself, then you will need to perform text
      shaping, and this is where HarfBuzz can help you.
    </para>
    <para>
      Here are some specific scenarios where a text-shaping engine
      like HarfBuzz helps you:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          OpenType fonts contain a set of glyphs (that is, shapes
	  to represent the letters, numbers, punctuation marks, and
	  all other symbols), which are indexed by a <literal>glyph ID</literal>.
	</para>
	<para>
          A particular glyph ID within the font does not necessarily
	  correlate to a predictable Unicode codepoint. For instance,
	  some fonts have the letter &quot;a&quot; as glyph ID 1, but
	  many others do not. In order to retrieve the right glyph
	  from the font to display &quot;a&quot;, you need to consult
	  the table inside the font (the <literal>cmap</literal>
	  table) that maps Unicode codepoints to glyph IDs. In other
	  words, <emphasis>text shaping turns codepoints into glyph
	  IDs</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Many OpenType fonts contain ligatures: combinations of
          characters that are rendered as a single unit. For instance,
	  it is common for the <literal>fi</literal> letter
	  combination to appear in print as the single ligature glyph
	  &quot;fi&quot;.
	</para>
	<para>
	  Whether you should render an &quot;f, i&quot; sequence
	  as <literal>fi</literal> or as &quot;fi&quot; does not
          depend on the input text. Instead, it depends on the whether
	  or not the font includes an &quot;fi&quot; glyph and on the
	  level of ligature application you wish to perform. The font
	  and the amount of ligature application used are under your
	  control. In other words, <emphasis>text shaping involves
	  querying the font's ligature tables and determining what
	  substitutions should be made</emphasis>. 
        </para>
      </listitem>
      <listitem>
        <para>
          While ligatures like &quot;fi&quot; are optional typographic
          refinements, some languages <emphasis>require</emphasis> certain
          substitutions to be made in order to display text correctly.
        </para>
	<para>
	  For example, in Tamil, when the letter &quot;TTA&quot; (ட)
	  letter is followed by &quot;U&quot; (உ), the pair
	  must be replaced by the single glyph &quot;டு&quot;. The
	  sequence of Unicode characters &quot;டஉ&quot; needs to be
	  substituted with a single &quot;டு&quot; glyph from the
	  font.
	</para>
	<para>
	  But &quot;டு&quot; does not have a Unicode codepoint. To
	  find this glyph, you need to consult the table inside 
	  the font (the <literal>GSUB</literal> table) that contains
	  substitution information. In other words, <emphasis>text shaping 
	  chooses the correct glyph for a sequence of characters
	  provided</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Similarly, each Arabic character has four different variants
	  corresponding to the different positions it might appear in
	  within a sequence. Inside a font, there will be separate
	  glyphs for the initial, medial, final, and isolated forms of
	  each letter, each at a different glyph ID.
	</para>
	<para>
	  Unicode only assigns one codepoint per character, so a
	  Unicode string will not tell you which glyph variant to use
	  for each character. To decide, you need to analyze the whole
	  string and determine the appropriate glyph for each character
	  based on its position. In other words, <emphasis>text
	  shaping chooses the correct form of the letter by its
	  position and returns the correct glyph from the font</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Other languages involve marks and accents that need to be
          rendered in specific positions relative a base character. For
          instance, the Moldovan language includes the Cyrillic letter
          &quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
	</para>
	<para>
	  Some fonts will provide this character as a single
	  zhe-with-breve glyph, but other fonts will not and, instead,
	  will expect the rendering engine to form the character by 
          superimposing the separate &quot;ж&quot; and &quot;˘&quot;
	  glyphs.
	</para>
	<para>
	  But exactly where you should draw the breve depends on the
	  height and width of the preceding zhe glyph. To find the
	  right position, you need to consult the table inside
	  the font (the <literal>GPOS</literal> table) that contains
	  positioning information.
          In other words, <emphasis>text shaping tells you whether you
	  have a precomposed glyph within your font or if you need to
	  compose a glyph yourself out of combining marks&mdash;and,
	  if so, where to position those marks.</emphasis>
        </para>
      </listitem>
    </itemizedlist>
    <para>
      If tasks like these are something that you need to do, then you
      need a text shaping engine. You could use Uniscribe if you are
      writing Windows software; you could use CoreText on macOS; or
      you could use HarfBuzz.
    </para>
    <note>
      <para>
	In the rest of this manual, the text will assume that the reader
	is that implementor of a text-layout engine.
      </para>
    </note>
  </section>
  

  <section>
    <title>What does HarfBuzz do?</title>
    <para>
      HarfBuzz provides text shaping through a cross-platform
      C API that accepts sequences of Unicode codepoints as input. Currently,
      the following OpenType shaping models are supported:
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  Indic (covering Devanagari, Bengali, Gujarati,
	  Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, and
	  Sinhala)
	</para>
      </listitem>
      <listitem>
	<para>
	  Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
	</para>
      </listitem>
      <listitem>
	<para>
	  Thai and Lao
	</para>
      </listitem>
      <listitem>
	<para>
	  Khmer
	</para>
      </listitem>
      <listitem>
	<para>
	  Myanmar
	</para>
      </listitem>
      
      <listitem>
	<para>
	  Tibetan
	</para>
      </listitem>
      
      <listitem>
	<para>
	  Hangul
	</para>
      </listitem>
      
      <listitem>
	<para>
	  Hebrew
	</para>
      </listitem>      
      <listitem>
	<para>
	  The Universal Shaping Engine or <emphasis>USE</emphasis>
	  (covering complex scripts not covered by the above shaping
	  models)
	</para>
      </listitem>      
      <listitem>
	<para>
	  A default shaping model for non-complex scripts
	  (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
	  and many others)
	</para>
      </listitem>
      <listitem>
	<para>
	  Emoji (including emoji modifier sequences, flag sequences,
	  and ZWJ sequences)
	</para>
      </listitem>
    </itemizedlist>

    <para>
      In addition to OpenType shaping, HarfBuzz supports the latest
      version of Graphite shaping. HarfBuzz currently supports AAT
      shaping only on macOS and iOS systems, and in a pass-through
      fashion: HarfBuzz hands off AAT support to the system CoreText
      library. However, full, built-in AAT support within HarfBuzz is
      under development.
    </para>
    
    <para>
      HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
      collections (.ttc), and OpenType fonts (.otf, including those
      fonts that contain TrueType-style outlines and those that
      contain PostScript CFF or CFF2 outlines).
    </para>

    <para>
      HarfBuzz can run on top of the FreeType, CoreText, DirectWrite,
      or Uniscribe font renderers.
    </para>
    
    <para>
      In addition to its core shaping functionality, HarfBuzz provides
      functions for accessing other font features, including optional
      GSUB and GPOS OpenType features, as well as
      all color-font formats (<literal>CBDT</literal>,
      <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
      <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
      also includes a font-subsetting feature.
    </para>

    <para>
      HarfBuzz can perform some low-level math-shaping operations, 
      although it does not currently perform full shaping for
      mathematical typesetting.
    </para>
    
    <para>
      A suite of command-line utilities is also provided in the
      source-code tree, designed to help users test and debug
      HarfBuzz's features on real-world fonts and input.
    </para>
  </section>

  <section id="what-harfbuzz-doesnt-do">
    <title>What HarfBuzz doesn't do</title>
    <para>
      HarfBuzz will take a Unicode string, shape it, and give you the
      information required to lay it out correctly on a single
      horizontal (or vertical) line using the font provided. That is the
      extent of HarfBuzz's responsibility.
    </para>
    <para>
      It is important to note that if you are implementing a complete
      text-layout engine you may have other responsibilities that
      HarfBuzz will <emphasis>not</emphasis> help you with. For example:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          HarfBuzz won't help you with bidirectionality. If you want to
          lay out text that includes a mix of Hebrew and English, you
	  will need to ensure that each buffer provided to HarfBuzz has its
          characters in the correct layout order. This will be different
          from the logical order in which the Unicode text is stored. In
          other words, the user will hit the keys in the following
          sequence:
        </para>
        <programlisting>
	  A B C [space] ג ב א [space] D E F
        </programlisting>
        <para>
          but will expect to see in the output:
        </para>
        <programlisting>
	  ABC אבג DEF
        </programlisting>
        <para>
          This reordering is called <emphasis>bidi processing</emphasis>
          (&quot;bidi&quot; is short for bidirectional), and there's an
          algorithm as an annex to the Unicode Standard which tells you how
          to reorder a string from logical order into presentation order.
          Before sending your string to HarfBuzz, you may need to apply the
          bidi algorithm to it. Libraries such as <ulink
	  url="http://icu-project.org/">ICU</ulink> and <ulink
	  url="http://fribidi.org/">fribidi</a> can do this for you.
        </para>
      </listitem>
      <listitem>
        <para>
          HarfBuzz won't help you with text that contains different font
          properties. For instance, if you have the string &quot;a
          <emphasis>huge</emphasis> breakfast&quot;, and you expect
          &quot;huge&quot; to be italic, then you will need to send three
          strings to HarfBuzz: <literal>a</literal>, in your Roman font;
          <literal>huge</literal> using your italic font; and
          <literal>breakfast</literal> using your Roman font again.
	</para>
	<para>
          Similarly, if you change the font, font size, script,
	  language, or direction within your string, then you will
	  need to shape each run independently and output them
	  independently. HarfBuzz expects to shape a run of characters
	  that all share the same properties.
        </para>
      </listitem>
      <listitem>
        <para>
          HarfBuzz won't help you with line breaking, hyphenation, or
          justification. As mentioned above, HarfBuzz lays out the string
          along a <emphasis>single line</emphasis> of, notionally,
          infinite length. If you want to find out where the potential
          word, sentence and line break points are in your text, you
          could use the ICU library's break iterator functions.
        </para>
        <para>
          HarfBuzz can tell you how wide a shaped piece of text is, which is
          useful input to a justification algorithm, but it knows nothing
          about paragraphs, lines or line lengths. Nor will it adjust the
          space between words to fit them proportionally into a line. If you
          want to layout text in paragraphs, you will probably want to send
          each word of your text to HarfBuzz to determine its shaped width
          after glyph substitutions, then work out how many words will fit
          on a line, and then finally output each word of the line separated
          by a space of the correct size to fully justify the paragraph.
        </para>
      </listitem>
    </itemizedlist>
    <para>
      As a layout-engine implementor, HarfBuzz will help you with the
      interface between your text and your font, and that's something
      that you'll need&mdash;what you then do with the glyphs that your font
      returns is up to you. 
    </para>
  </section>
    
  <section id="why-is-it-called-harfbuzz">
    <title>Why is it called HarfBuzz?</title>
    <para>
      HarfBuzz began its life as text-shaping code within the FreeType
      project (and you will see references to the FreeType authors
      within the source code copyright declarations), but was then
      extracted out to its own project. This project is maintained by
      Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
      shaping engine for OpenType fonts&mdash;&quot;HarfBuzz&quot; is
      the Persian for &quot;open type&quot;.
    </para>
  </section>
</chapter>