architecture.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 
<html> 

<head> 
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"> 
<link href="wayland.css" rel="stylesheet" type="text/css">
<title>Wayland</title> 
</head>

<body>
<h1><a href="/"><img src="wayland.png" alt="Wayland logo"></a></h1>

<h2>Wayland Architecture</h2>

<p>A good way to understand the wayland architecture and how it is
different from X is to follow an event from the input device to the
point where the change it affects appears on screen.</p>

<p>This is where we are now with X:</p>
<p><img src="x-architecture.png" alt="X architecture diagram"></p>

<ol>
  <li>
    The kernel gets an event from an input device and sends it to X
    through the evdev input driver.  The kernel does all the hard work
    here by driving the device and translating the different device
    specific event protocols to the linux evdev input event standard.
  </li>

  <li>
    The X server determines which window the event affects and sends
    it to the clients that have selected for the event in question on
    that window.  The X server doesn't actually know how to do this
    right, since the window location on screen is controlled by the
    compositor and may be transformed in a number of ways that the X
    server doesn't understand (scaled down, rotated, wobbling,
    etc).
  </li>

  <li>
    The client looks at the event and decides what to do.  Often the
    UI will have to change in response to the event - perhaps a check
    box was clicked or the pointer entered a button that must be
    highlighted.  Thus the client sends a rendering request back to
    the X server.
  </li>

  <li>
    When the X server receives the rendering request, it sends it to
    the driver to let it program the hardware to do the rendering.
    The X server also calculates the bounding region of the rendering,
    and sends that to the compositor as a <em>damage event</em>.
  </li>

  <li>
    The damage event tells the compositor that something changed in
    the window and that it has to recomposite the part of the screen
    where that window is visible.  The compositor is responsible for
    rendering the entire screen contents based on its scenegraph and
    the contents of the X windows.  Yet, it has to go through the X
    server to render this.
  </li>

  <li>
    The X server receives the rendering requests from the compositor
    and either copies the compositor back buffer to the front buffer
    or does a pageflip.  In the general case, the X server has to do
    this step so it can account for overlapping windows, which may
    require clipping and determine whether or not it can page flip.
    However, for a compositor, which is always fullscreen, this is
    another unnecessary context switch.
  </li>
</ol>

<p>
  As suggested above, there are a few problems with this approach.
  The X server doesn't have the information to decide which window
  should receive the event, nor can it transform the screen
  coordinates to window local coordinates.  And even though X has
  handed responsibility for the final painting of the screen to the
  compositing manager, X still controls the front buffer and
  modesetting.  Most of the complexity that the X server used to
  handle is now available in the kernel or self contained libraries
  (KMS, evdev, mesa, fontconfig, freetype, cairo, Qt etc).  In
  general, the X server is now just a middle man that introduces an
  extra step between applications and the compositor and an extra step
  between the compositor and the hardware.
</p>

<p>
  In wayland the compositor <em>is</em> the display server.  We
  transfer the control of KMS and evdev to the compositor.  The
  wayland protocol lets the compositor send the input events directly
  to the clients and lets the client send the damage event directly to
  the compositor:
</p>

<p><img src="wayland-architecture.png" alt="Wayland architecture diagram"></p>

<ol>
  <li>
    The kernel gets an event and sends it to the compositor.  This is
    similar to the X case, which is great, since we get to reuse all
    the input drivers in the kernel.
  </li>

  <li>
    The compositor looks through its scenegraph to determine which
    window should receive the event.  The scenegraph corresponds to
    what's on screen and the compositor understands the
    transformations that it may have applied to the elements in the
    scenegraph.  Thus, the compositor can pick the right window and
    transform the screen coordinates to window local coordinates, by
    applying the inverse transformations.  The types of transformation
    that can be applied to a window is only restriced to what the
    compositor can do, as long as it can compute the inverse
    transformation for the input events.
  </li>

  <li>
    As in the X case, when the client receives the event, it updates
    the UI in response.  But in the wayland case, the rendering
    happens in the client, and the client just sends a request to the
    compositor to indicate the region that was updated.
  </li>

  <li>
    The compositor collects damage requests from its clients and then
    recomposites the screen.  The compositor can then directly issue
    an ioctl to schedule a pageflip with KMS.
  </li>
</ol>

<h2>Wayland Rendering</h2>

<p>
  One of the details I left out in the above overview is how clients
  actually render under wayland.  By removing the X server from the
  picture we also removed the mechanism by which X clients typically
  render.  But there's another mechanism that we're already using with
  DRI2 under X: <em>direct rendering</em>.  With direct rendering, the
  client and the server share a video memory buffer.  The client links
  to a rendering library such as OpenGL that knows how to program the
  hardware and renders directly into the buffer.  The compositor in
  turn can take the buffer and use it as a texture when it composites
  the desktop.  After the initial setup, the client only needs to tell
  the compositor which buffer to use and when and where it has
  rendered new content into it.
</p>

<p>
  This leaves an application with two ways to update its window
  contents:
</p>

<ol>
  <li>
    Render the new content into a new buffer and tell the compositor
    to use that instead of the old buffer.  The application can
    allocate a new buffer every time it needs to update the window
    contents or it can keep two (or more) buffers around and cycle
    between them.  The buffer management is entirely under application
    control.
  </li>

  <li>
    Render the new content into the buffer that it previously told the
    compositor to to use.  While it's possible to just render directly
    into the buffer shared with the compositor, this might race with
    the compositor.  What can happen is that repainting the window
    contents could be interrupted by the compositor repainting the
    desktop.  If the application gets interrupted just after clearing
    the window but before rendering the contents, the compositor will
    texture from a blank buffer.  The result is that the application
    window will flicker between a blank window or half-rendered
    content.  The traditional way to avoid this is to render the new
    content into a back buffer and then copy from there into the
    compositor surface.  The back buffer can be allocated on the fly
    and just big enough to hold the new content, or the application
    can keep a buffer around.  Again, this is under application
    control.
  </li>
</ol>

<p>
  In either case, the application must tell the compositor which area
  of the surface holds new contents.  When the application renders
  directly the to shared buffer, the compositor needs to be noticed
  that there is new content.  But also when exchanging buffers, the
  compositor doesn't assume anything changed, and needs a request from
  the application before it will repaint the desktop.  The idea that
  even if an application passes a new buffer to the compositor, only a
  small part of the buffer may be different, like a blinking cursor or
  a spinner.
</p>

<h2>X as a Wayland Client</h2>

<p>
  Wayland is a complete window system in itself, but even so, if we're
  migrating away from X, it makes sense to have a good backwards
  compatibility story.  With a few changes, the Xorg server can be
  modified to use wayland input devices for input and forward either
  the root window or individual top-level windows as wayland surfaces.
  The server still runs the same 2D driver with the same acceleration
  code as it does when it runs natively, the main difference is that
  wayland handles presentation of the windows instead of KMS.
</p>

<p><img src="x-on-wayland.png" alt="X on Wayland architecture diagram"></p>


<!-- -->

</body>
</html>