summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/usermanual-clusters.xml743
1 files changed, 473 insertions, 270 deletions
diff --git a/docs/usermanual-clusters.xml b/docs/usermanual-clusters.xml
index 7b2c7adc..f7db0f59 100644
--- a/docs/usermanual-clusters.xml
+++ b/docs/usermanual-clusters.xml
@@ -5,306 +5,509 @@
<!ENTITY version SYSTEM "version.xml">
]>
<chapter id="clusters">
-<sect1 id="clusters">
<title>Clusters</title>
- <para>
- In shaping text, a <emphasis>cluster</emphasis> is a sequence of
- code points that needs to be treated as a single, indivisible unit.
- </para>
- <para>
- When you add text to a HB buffer, each character is associated with
- a <emphasis>cluster value</emphasis>. This is an arbitrary number as
- far as HB is concerned.
- </para>
- <para>
- Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the
- actual number does not matter. Moreover, it is not required for the
- cluster values to be monotonically increasing, but pretty much all
- of HB's tests are performed on monotonically increasing cluster
- numbers. Nevertheless, there is no such assumption in the code
- itself. With that in mind, let's examine what happens with cluster
- values during shaping under each cluster-level.
- </para>
- <para>
- HarfBuzz provides three <emphasis>levels</emphasis> of clustering
- support. Level 0 is the default behavior and reproduces the behavior
- of the old HarfBuzz library. Level 1 tweaks this behavior slightly
- to produce better results, so level 1 clustering is recommended for
- code that is not required to implement backward compatibility with
- the old HarfBuzz.
- </para>
- <para>
- Level 2 differs significantly in how it treats cluster values.
- Levels 0 and 1 both process ligatures and glyph decomposition by
- merging clusters; level 2 does not.
- </para>
- <para>
- The conceptual model for what the cluster values mean, in levels 0
- and 1, is this:
- </para>
- <itemizedlist spacing="compact">
- <listitem>
- <para>
- the sequence of cluster values will always remain monotone
- </para>
- </listitem>
- <listitem>
- <para>
- each value represents a single cluster
- </para>
- </listitem>
- <listitem>
- <para>
- each cluster contains one or more glyphs and one or more
- characters
- </para>
- </listitem>
- </itemizedlist>
- <para>
- Assuming that initial cluster numbers were monotonically increasing
- and distinct, then all adjacent glyphs having the same cluster
- number belong to the same cluster, and all characters belong to the
- cluster that has the highest number not larger than their initial
- cluster number. This will become clearer with an example.
- </para>
-</sect1>
-<sect1 id="a-clustering-example-for-levels-0-and-1">
- <title>A clustering example for levels 0 and 1</title>
- <para>
- Let's say we start with the following character sequence and cluster
- values:
- </para>
- <programlisting>
- A,B,C,D,E
- 0,1,2,3,4
-</programlisting>
- <para>
- We then map the characters to glyphs. For simplicity, let's assume
- that each character maps to the corresponding, identical-looking
- glyph:
- </para>
- <programlisting>
- A,B,C,D,E
- 0,1,2,3,4
-</programlisting>
- <para>
- Now if, for example, <literal>B</literal> and <literal>C</literal>
- ligate, then the clusters to which they belong &quot;merge&quot;.
- This merged cluster takes for its cluster number the minimum of all
- the cluster numbers of the clusters that went in. In this case, we
- get:
- </para>
- <programlisting>
- A,BC,D,E
- 0,1 ,3,4
-</programlisting>
- <para>
- Now let's assume that the <literal>BC</literal> glyph decomposes
- into three components, and <literal>D</literal> also decomposes into
- two. The components each inherit the cluster value of their parent:
- </para>
- <programlisting>
- A,BC0,BC1,BC2,D0,D1,E
- 0,1 ,1 ,1 ,3 ,3 ,4
-</programlisting>
- <para>
- Now if <literal>BC2</literal> and <literal>D0</literal> ligate, then
- their clusters (numbers 1 and 3) merge into
- <literal>min(1,3) = 1</literal>:
- </para>
- <programlisting>
- A,BC0,BC1,BC2D0,D1,E
- 0,1 ,1 ,1 ,1 ,4
-</programlisting>
- <para>
- At this point, cluster 1 means: the character sequence
- <literal>BCD</literal> is represented by glyphs
- <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
- further.
- </para>
-</sect1>
-<sect1 id="reordering-in-levels-0-and-1">
- <title>Reordering in levels 0 and 1</title>
- <para>
- Another common operation in the more complex shapers is when things
- reorder. In those cases, to maintain monotone clusters, HB merges
- the clusters of everything in the reordering sequence. For example,
- let's again start with the character sequence:
- </para>
- <programlisting>
- A,B,C,D,E
- 0,1,2,3,4
-</programlisting>
- <para>
- If <literal>D</literal> is reordered before <literal>B</literal>,
- then the <literal>B</literal>, <literal>C</literal>, and
- <literal>D</literal> clusters merge, and we get:
- </para>
- <programlisting>
- A,D,B,C,E
- 0,1,1,1,4
-</programlisting>
- <para>
- This is clearly not ideal, but it is the only sensible way to
- maintain monotone indices and retain the true relationship between
- glyphs and characters.
- </para>
-</sect1>
-<sect1 id="the-distinction-between-levels-0-and-1">
- <title>The distinction between levels 0 and 1</title>
- <para>
- So, the above is pretty much what cluster levels 0 and 1 do. The
- only difference between the two is this: in level 0, at the very
- beginning of the shaping process, we also merge clusters between
- base characters and all Unicode marks (combining or not) following
- them. E.g.:
- </para>
- <programlisting>
- A,acute,B
- 0,1 ,2
-</programlisting>
- <para>
- will become:
- </para>
- <programlisting>
- A,acute,B
- 0,0 ,2
-</programlisting>
- <para>
- This is the default behavior. We do it because Windows did it and
- old HarfBuzz did it, so this remained the default. But this behavior
- makes it impossible to color diacritic marks differently from their
- base characters. That's why in level 1 we do not perform this
- initial merging step.
- </para>
- <para>
- For clients, level 0 is more convenient if they rely on HarfBuzz
- clusters for cursor positioning. But that's wrong anyway: cursor
- positions should be determined based on Unicode grapheme boundaries,
- NOT shaping clusters. As such, level 1 clusters are preferred.
- </para>
- <para>
- One last note about levels 0 and 1. We currently don't allow a
- <literal>MultipleSubst</literal> lookup to replace a glyph with zero
- glyphs (i.e., to delete a glyph). But in some other situations,
- glyphs can be deleted. In those cases, if the glyph being deleted is
- the last glyph of its cluster, we make sure to merge the cluster
- with a neighboring cluster.
- </para>
- <para>
- This is, primarily, to make sure that the starting cluster of the
- text always has the cluster index pointing to the start of the text
- for the run; more than one client currently relies on this
- guarantee.
- </para>
- <para>
- Incidentally, Apple's CoreText does something else to maintain the
- same promise: it inserts a glyph with id 65535 at the beginning of
- the glyph string if the glyph corresponding to the first character
- in the run was deleted. HarfBuzz might do something similar in the
- future.
- </para>
-</sect1>
-<sect1 id="level-2">
- <title>Level 2</title>
- <para>
- Level 2 is a different beast from levels 0 and 1. It is simple to
- describe, but hard to make sense of. It simply doesn't do any
- cluster merging whatsoever. When things ligate or otherwise multiple
- glyphs turn into one, the cluster value of the first glyph is
- retained.
- </para>
- <para>
- Here are a few examples of why processing cluster values produced at
- this level might be tricky:
- </para>
- <sect2 id="ligatures-with-combining-marks">
- <title>Ligatures with combining marks</title>
- <para>
- Imagine capital letters are bases and lower case letters are
- combining marks. With an input sequence like this:
+ <section id="clusters">
+ <title>Clusters</title>
+ <para>
+ In text shaping, a <emphasis>cluster</emphasis> is a sequence of
+ characters that needs to be treated as a single, indivisible
+ unit.
</para>
- <programlisting>
- A,a,B,b,C,c
- 0,1,2,3,4,5
-</programlisting>
<para>
- if <literal>A,B,C</literal> ligate, then here are the cluster
- values one would get under the various levels:
+ During the shaping process, some shaping operations may
+ merge adjacent characters (for example, when two code points form
+ a ligature and are replaced by a single glyph) or split one
+ character into several (for example, when performing the Unicode
+ canonical decomposition of a code point).
</para>
<para>
- level 0:
+ HarfBuzz tracks clusters independently from how these
+ shaping operations alter the individual glyphs that comprise the
+ output HarfBuzz returns in a buffer. Consequently,
+ a client program using HarfBuzz can utilize the cluster
+ information to implement features such as:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Correctly positioning the cursor between two characters that
+ have combined into a single glyph by forming a ligature.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Correctly highlighting a text selection that includes some,
+ but not all, of the characters comprising a ligature.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Applying text attributes (such as color or underlining) to
+ part, but not all, of a composed base-and-mark combination.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Generating output document formats (such as PDF) with
+ embedded text that can be fully extracted.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Performing line-breaking, justification, and other
+ line-level or paragraph-level operations that must be done
+ after shaping is complete, but which require character-level
+ properties.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ When you add text to a HarfBuzz buffer, each code point is assigned
+ a <emphasis>cluster value</emphasis>.
+ </para>
+ <para>
+ This cluster value is an arbitrary number; HarfBuzz uses it only
+ to distinguish between clusters. Many client programs will use
+ the index of each code point in the input text stream as the
+ cluster value, as a matter of convenience; the actual value does
+ not matter.
+ </para>
+ <para>
+ Client programs can choose how HarfBuzz handles clusters during
+ shaping by setting the
+ <literal>cluster_level</literal> of the
+ buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
+ clustering support for this property:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para><emphasis>Level 0</emphasis> is the default and
+ reproduces the behavior of the old HarfBuzz library.
+ </para>
+ <para>
+ The distinguishing feature of level 0 behavior is that, at
+ the beginning of processing the buffer, all code points that
+ are categorized as <emphasis>marks</emphasis>,
+ <emphasis>modifier symbols</emphasis>, or
+ <emphasis>Emoji extended pictographic</emphasis> modifiers,
+ as well as the <emphasis>Zero Width Joiner</emphasis> and
+ <emphasis>Zero Width Non-Joiner</emphasis> code points, are
+ assigned the cluster value of the closest preceding code
+ point from <emphasis>diferent</emphasis> category.
+ </para>
+ <para>
+ In essence, whenever a base character is followed by a mark
+ character or a sequence of mark characters, those marks are
+ reassigned to the same initial cluster value as the base
+ character. This reassignment is referred to as
+ "merging" the affected clusters. This behavior is based on
+ the Grapheme Cluster Boundary specification in <ulink
+ url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
+ Technical Report 29</ulink>.
+ </para>
+ <para>
+ Client programs can specify level 0 behavior for a buffer by
+ setting its <literal>cluster_level</literal> to
+ <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <emphasis>Level 1</emphasis> tweaks the old behavior
+ slightly to produce better results. Therefore, level 1
+ clustering is recommended for code that is not required to
+ implement backward compatibility with the old HarfBuzz.
+ </para>
+ <para>
+ Level 1 differs from level 0 by not merging the
+ clusters of marks and other modifier code points with the
+ preceding "base" code point's cluster. By preserving the
+ cluster values of these marks and modifier code points,
+ script shaping can perform additional operations that might
+ lead to improved results (for example, reordering a sequence
+ of marks).
+ </para>
+ <para>
+ Client programs can specify level 1 behavior for a buffer by
+ setting its <literal>cluster_level</literal> to
+ <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <emphasis>Level 2</emphasis> differs significantly in how it
+ treats cluster values. In level 2, HarfBuzz never merges
+ clusters.
+ </para>
+ <para>
+ This difference can be seen most clearly when HarfBuzz processes
+ ligature substitutions and glyph decompositions. In level 0
+ and level 1, ligatures and glyph decomposition both involve
+ merging clusters; in level 2, neither of these operations
+ triggers a merge.
+ </para>
+ <para>
+ Client programs can specify level 2 behavior for a buffer by
+ setting its <literal>cluster_level</literal> to
+ <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ It is not <emphasis>required</emphasis> that the cluster values
+ in a buffer be monotonically increasing. However, if the initial
+ cluster values in a buffer are monotonic and the buffer is
+ configured to use clustering level 0 or 1, then HarfBuzz
+ guarantees that the final cluster values in the shaped buffer
+ will also be monotonic. No such guarantee is made for cluster
+ level 2.
+ </para>
+ <para>
+ In levels 0 and 1, HarfBuzz implements the following conceptual model for
+ cluster values:
+ </para>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ The sequence of cluster values will always remain monotonic.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Each cluster value represents a single cluster.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Each cluster contains one or more glyphs and one or more
+ characters.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ In practice, this model offers several benefits. Assuming that
+ the initial cluster values were monotonically increasing
+ and distinct before shaping began, then, in the final output:
+ </para>
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ All adjacent glyphs having the same final cluster
+ value belong to the same cluster.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Each character belongs to the cluster that has the highest
+ cluster value <emphasis>not larger than</emphasis> its
+ initial cluster value.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ </section>
+ <section id="a-clustering-example-for-levels-0-and-1">
+ <title>A clustering example for levels 0 and 1</title>
+ <para>
+ The guarantees and benefits of level 0 and level 1 can be seen
+ with some examples. First, let us examine what happens with cluster
+ values when shaping involves cluster merging with ligatures and
+ decomposition.
+ </para>
+ <para>
+ Let's say we start with the following character sequence (top row) and
+ initial cluster values (bottom row):
</para>
<programlisting>
- ABC,a,b,c
- 0 ,0,0,0
-</programlisting>
+ A,B,C,D,E
+ 0,1,2,3,4
+ </programlisting>
<para>
- level 1:
+ During shaping, HarfBuzz maps these characters to glyphs from
+ the font. For simplicity, let's assume that each character maps
+ to the corresponding, identical-looking glyph:
</para>
<programlisting>
- ABC,a,b,c
- 0 ,0,0,5
-</programlisting>
+ A,B,C,D,E
+ 0,1,2,3,4
+ </programlisting>
<para>
- level 2:
+ Now if, for example, <literal>B</literal> and <literal>C</literal>
+ form a ligature, then the clusters to which they belong
+ &quot;merge&quot;. This merged cluster takes for its cluster
+ value the minimum of all the cluster values of the clusters that
+ went in to the ligature. In this case, we get:
</para>
<programlisting>
- ABC,a,b,c
- 0 ,1,3,5
-</programlisting>
+ A,BC,D,E
+ 0,1 ,3,4
+ </programlisting>
+ <para>
+ because 1 is the minimum of the set {1,2}, which were the
+ cluster values of <literal>B</literal> and
+ <literal>C</literal>.
+ </para>
<para>
- Making sense of the last example is the hardest for a client,
- because there is nothing in the cluster values to suggest that
- <literal>B</literal> and <literal>C</literal> ligated with
- <literal>A</literal>.
+ Next, let us say that the <literal>BC</literal> ligature glyph
+ decomposes into three components, and <literal>D</literal> also
+ decomposes into two components. These components each inherit the
+ cluster value of their parent:
</para>
- </sect2>
- <sect2 id="reordering">
- <title>Reordering</title>
+ <programlisting>
+ A,BC0,BC1,BC2,D0,D1,E
+ 0,1 ,1 ,1 ,3 ,3 ,4
+ </programlisting>
<para>
- Another tricky case is when things reorder. Under level 2:
+ Next, if <literal>BC2</literal> and <literal>D0</literal> form a
+ ligature, then their clusters (cluster values 1 and 3) merge into
+ <literal>min(1,3) = 1</literal>:
</para>
<programlisting>
- A,B,C,D,E
- 0,1,2,3,4
-</programlisting>
+ A,BC0,BC1,BC2D0,D1,E
+ 0,1 ,1 ,1 ,1 ,4
+ </programlisting>
+ <para>
+ At this point, cluster 1 means: the character sequence
+ <literal>BCD</literal> is represented by glyphs
+ <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
+ further.
+ </para>
+ </section>
+ <section id="reordering-in-levels-0-and-1">
+ <title>Reordering in levels 0 and 1</title>
+ <para>
+ Another common operation in the more complex shapers is glyph
+ reordering. In order to maintain a monotonic cluster sequence
+ when glyph reordering takes place, HarfBuzz merges the clusters
+ of everything in the reordering sequence.
+ </para>
<para>
- Now imagine <literal>D</literal> moves before
- <literal>B</literal>:
+ For example, let us again start with the character sequence (top
+ row) and initial cluster values (bottom row):
</para>
<programlisting>
- A,D,B,C,E
- 0,3,1,2,4
-</programlisting>
+ A,B,C,D,E
+ 0,1,2,3,4
+ </programlisting>
<para>
- Now, if <literal>D</literal> ligates with <literal>B</literal>, we
+ If <literal>D</literal> is reordered before <literal>B</literal>,
+ then HarfBuzz merges the <literal>B</literal>,
+ <literal>C</literal>, and <literal>D</literal> clusters, and we
get:
</para>
<programlisting>
- A,DB,C,E
- 0,3 ,2,4
-</programlisting>
+ A,D,B,C,E
+ 0,1,1,1,4
+ </programlisting>
<para>
- In a different scenario, <literal>A</literal> and
- <literal>B</literal> could have ligated
- <emphasis>before</emphasis> <literal>D</literal> reordered; that
- would have resulted in:
+ This is clearly not ideal, but it is the only sensible way to
+ maintain a monotonic sequence of cluster values and retain the
+ true relationship between glyphs and characters.
+ </para>
+ </section>
+ <section id="the-distinction-between-levels-0-and-1">
+ <title>The distinction between levels 0 and 1</title>
+ <para>
+ The preceding examples demonstrate the main effects of using
+ cluster levels 0 and 1. The only difference between the two
+ levels is this: in level 0, at the very beginning of the shaping
+ process, HarfBuzz also merges clusters between any base character
+ and all Unicode marks (combining or not) that follow it.
+ </para>
+ <para>
+ For example, let us start with the following character sequence
+ (top row) and accompanying initial cluster values (bottom row):
+ </para>
+ <programlisting>
+ A,acute,B
+ 0,1 ,2
+ </programlisting>
+ <para>
+ The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
+ using cluster level 0 on this sequence, then the
+ <literal>A</literal> and <literal>acute</literal> clusters will
+ merge, and the result will become:
</para>
<programlisting>
- AB,D,C,E
- 0 ,3,2,4
-</programlisting>
+ A,acute,B
+ 0,0 ,2
+ </programlisting>
+ <para>
+ This initial cluster merging is the default behavior of the
+ Windows shaping engine, and the old HarfBuzz codebase copied
+ that behavior to maintain compatibility. Consequently, it has
+ remained the default behavior in the new HarfBuzz codebase.
+ </para>
+ <para>
+ But this initial cluster-merging behavior makes it impossible to
+ color diacritic marks differently from their base
+ characters. That is why, in level 1, HarfBuzz does not perform
+ the initial merging step.
+ </para>
+ <para>
+ For client programs that rely on HarfBuzz cluster values to
+ perform cursor positioning, level 0 is more convenient. But
+ relying on cluster boundaries for cursor positioning is wrong: cursor
+ positions should be determined based on Unicode grapheme
+ boundaries, not on shaping-cluster boundaries. As such, level 1
+ clusters are preferred.
+ </para>
+ <para>
+ One last note about levels 0 and 1. HarfBuzz currently does not allow a
+ <literal>MultipleSubst</literal> lookup to replace a glyph with zero
+ glyphs (in other words, to delete a glyph). But, in some other situations,
+ glyphs can be deleted. In those cases, if the glyph being deleted is
+ the last glyph of its cluster, HarfBuzz makes sure to merge the cluster
+ with a neighboring cluster.
+ </para>
+ <para>
+ This is done primarily to make sure that the starting cluster of the
+ text always has the cluster index pointing to the start of the text
+ for the run; more than one client currently relies on this
+ guarantee.
+ </para>
+ <para>
+ Incidentally, Apple's CoreText does something else to maintain the
+ same promise: it inserts a glyph with id 65535 at the beginning of
+ the glyph string if the glyph corresponding to the first character
+ in the run was deleted. HarfBuzz might do something similar in the
+ future.
+ </para>
+ </section>
+ <section id="level-2">
+ <title>Level 2</title>
+ <para>
+ HarfBuzz's level 2 cluster behavior uses a significantly
+ different model than that of level 0 and level 1.
+ </para>
<para>
- There's no way to differentiate between these two scenarios based
- on the cluster numbers alone.
+ The level 2 behavior is easy to describe, but it may be
+ difficult to understand in practical terms. In brief, level 2
+ performs no merging of clusters whatsoever.
</para>
<para>
- Another problem happens with ligatures under level 2 if the
- direction of the text is forced to opposite of its natural
- direction (e.g. left-to-right Arabic). But that's too much of a
- corner case to worry about.
+ When glyphs form a ligature (or when some other feature
+ substitutes multiple glyphs with one glyph), the cluster value
+ of the first glyph is retained as the cluster value for the
+ ligature. However, no subsequent clusters &mdash; including
+ marks and modifiers &mdash; are affected.
</para>
- </sect2>
-</sect1>
+ <para>
+ Level 2 cluster behavior is less complex than level 0 or level
+ 1, but there are a few cases in which processing cluster values
+ produced at level 2 may be tricky.
+ </para>
+ <section id="ligatures-with-combining-marks-in-level-2">
+ <title>Ligatures with combining marks in level 2</title>
+ <para>
+ The first example of how HarfBuzz's level 2 cluster behavior
+ can be tricky is when the text to be shaped includes combining
+ marks attached to ligatures.
+ </para>
+ <para>
+ Let us start with an input sequence with the following
+ characters (top row) and initial cluster values (bottom row):
+ </para>
+ <programlisting>
+ A,acute,B,breve,C,circumflex
+ 0,1 ,2,3 ,4,5
+ </programlisting>
+ <para>
+ If the sequence <literal>A,B,C</literal> forms a ligature,
+ then these are the cluster values HarfBuzz will return under
+ the various cluster levels:
+ </para>
+ <para>
+ Level 0:
+ </para>
+ <programlisting>
+ ABC,acute,breve,circumflex
+ 0 ,0 ,0 ,0
+ </programlisting>
+ <para>
+ Level 1:
+ </para>
+ <programlisting>
+ ABC,acute,breve,circumflex
+ 0 ,0 ,0 ,5
+ </programlisting>
+ <para>
+ Level 2:
+ </para>
+ <programlisting>
+ ABC,acute,breve,circumflex
+ 0 ,1 ,3 ,5
+ </programlisting>
+ <para>
+ Making sense of the level 2 result is the hardest for a client
+ program, because there is nothing in the cluster values that
+ indicates that <literal>B</literal> and <literal>C</literal>
+ formed a ligature with <literal>A</literal>.
+ </para>
+ <para>
+ In contrast, the "merged" cluster values of the mark glyphs
+ that are seen in the level 0 and level 1 output are evidence
+ that a ligature substitution took place.
+ </para>
+ </section>
+ <section id="reordering-in-level-2">
+ <title>Reordering in level 2</title>
+ <para>
+ Another example of how HarfBuzz's level 2 cluster behavior
+ can be tricky is when glyphs reorder. Consider an input sequence
+ with the following characters (top row) and initial cluster
+ values (bottom row):
+ </para>
+ <programlisting>
+ A,B,C,D,E
+ 0,1,2,3,4
+ </programlisting>
+ <para>
+ Now imagine <literal>D</literal> moves before
+ <literal>B</literal> in a reordering operation. The cluster
+ values will then be:
+ </para>
+ <programlisting>
+ A,D,B,C,E
+ 0,3,1,2,4
+ </programlisting>
+ <para>
+ Next, if <literal>D</literal> forms a ligature with
+ <literal>B</literal>, the output is:
+ </para>
+ <programlisting>
+ A,DB,C,E
+ 0,3 ,2,4
+ </programlisting>
+ <para>
+ However, in a different scenario, in which the shaping rules
+ of the script instead caused <literal>A</literal> and
+ <literal>B</literal> to form a ligature
+ <emphasis>before</emphasis> the <literal>D</literal> reordered, the
+ result would be:
+ </para>
+ <programlisting>
+ AB,D,C,E
+ 0 ,3,2,4
+ </programlisting>
+ <para>
+ There is no way for a client program to differentiate between
+ these two scenarios based on the cluster values
+ alone. Consequently, client programs that use level 2 might
+ need to undertake additional work in order to manage cursor
+ positioning, text attributes, or other desired features.
+ </para>
+ </section>
+ <section id="other-considerations-in-level-2">
+ <title>Other considerations in level 2</title>
+ <para>
+ There may be other problems encountered with ligatures under
+ level 2, such as if the direction of the text is forced to
+ opposite of its natural direction (for example, left-to-right
+ Arabic). But, generally speaking, these other scenarios are
+ minor corner cases that are too obscure for most client
+ programs to need to worry about.
+ </para>
+ </section>
+ </section>
</chapter>