1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<link href="wayland.css" rel="stylesheet" type="text/css">
<title>Wayland</title>
</head>
<body>
<h1><a href="/"><img src="wayland.png" alt="Wayland logo"></a></h1>
<h2>Wayland Architecture</h2>
<p>A good way to understand the wayland architecture and how it is
different from X is to follow an event from the input device to the
point where the change it affects appears on screen.</p>
<p>This is where we are now with X:</p>
<p><img src="x-architecture.png" alt="X architecture diagram"></p>
<ol>
<li>
The kernel gets an event from an input device and sends it to X
through the evdev input driver. The kernel does all the hard work
here by driving the device and translating the different device
specific event protocols to the linux evdev input event standard.
</li>
<li>
The X server determines which window the event affects and sends
it to the clients that have selected for the event in question on
that window. The X server doesn't actually know how to do this
right, since the window location on screen is controlled by the
compositor and may be transformed in a number of ways that the X
server doesn't understand (scaled down, rotated, wobbling,
etc).
</li>
<li>
The client looks at the event and decides what to do. Often the
UI will have to change in response to the event - perhaps a check
box was clicked or the pointer entered a button that must be
highlighted. Thus the client sends a rendering request back to
the X server.
</li>
<li>
When the X server receives the rendering request, it sends it to
the driver to let it program the hardware to do the rendering.
The X server also calculates the bounding region of the rendering,
and sends that to the compositor as a <em>damage event</em>.
</li>
<li>
The damage event tells the compositor that something changed in
the window and that it has to recomposite the part of the screen
where that window is visible. The compositor is responsible for
rendering the entire screen contents based on its scenegraph and
the contents of the X windows. Yet, it has to go through the X
server to render this.
</li>
<li>
The X server receives the rendering requests from the compositor
and either copies the compositor back buffer to the front buffer
or does a pageflip. In the general case, the X server has to do
this step so it can account for overlapping windows, which may
require clipping and determine whether or not it can page flip.
However, for a compositor, which is always fullscreen, this is
another unnecessary context switch.
</li>
</ol>
<p>
As suggested above, there are a few problems with this approach.
The X server doesn't have the information to decide which window
should receive the event, nor can it transform the screen
coordinates to window local coordinates. And even though X has
handed responsibility for the final painting of the screen to the
compositing manager, X still controls the front buffer and
modesetting. Most of the complexity that the X server used to
handle is now available in the kernel or self contained libraries
(KMS, evdev, mesa, fontconfig, freetype, cairo, Qt etc). In
general, the X server is now just a middle man that introduces an
extra step between applications and the compositor and an extra step
between the compositor and the hardware.
</p>
<p>
In wayland the compositor <em>is</em> the display server. We
transfer the control of KMS and evdev to the compositor. The
wayland protocol lets the compositor send the input events directly
to the clients and lets the client send the damage event directly to
the compositor:
</p>
<p><img src="wayland-architecture.png" alt="Wayland architecture diagram"></p>
<ol>
<li>
The kernel gets an event and sends it to the compositor. This is
similar to the X case, which is great, since we get to reuse all
the input drivers in the kernel.
</li>
<li>
The compositor looks through its scenegraph to determine which
window should receive the event. The scenegraph corresponds to
what's on screen and the compositor understands the
transformations that it may have applied to the elements in the
scenegraph. Thus, the compositor can pick the right window and
transform the screen coordinates to window local coordinates, by
applying the inverse transformations. The types of transformation
that can be applied to a window is only restriced to what the
compositor can do, as long as it can compute the inverse
transformation for the input events.
</li>
<li>
As in the X case, when the client receives the event, it updates
the UI in response. But in the wayland case, the rendering
happens in the client, and the client just sends a request to the
compositor to indicate the region that was updated.
</li>
<li>
The compositor collects damage requests from its clients and then
recomposites the screen. The compositor can then directly issue
an ioctl to schedule a pageflip with KMS.
</li>
</ol>
<h2>Wayland Rendering</h2>
<p>
One of the details I left out in the above overview is how clients
actually render under wayland. By removing the X server from the
picture we also removed the mechanism by which X clients typically
render. But there's another mechanism that we're already using with
DRI2 under X: <em>direct rendering</em>. With direct rendering, the
client and the server share a video memory buffer. The client links
to a rendering library such as OpenGL that knows how to program the
hardware and renders directly into the buffer. The compositor in
turn can take the buffer and use it as a texture when it composites
the desktop. After the initial setup, the client only needs to tell
the compositor which buffer to use and when and where it has
rendered new content into it.
</p>
<p>
This leaves an application with two ways to update its window
contents:
</p>
<ol>
<li>
Render the new content into a new buffer and tell the compositor
to use that instead of the old buffer. The application can
allocate a new buffer every time it needs to update the window
contents or it can keep two (or more) buffers around and cycle
between them. The buffer management is entirely under application
control.
</li>
<li>
Render the new content into the buffer that it previously told the
compositor to to use. While it's possible to just render directly
into the buffer shared with the compositor, this might race with
the compositor. What can happen is that repainting the window
contents could be interrupted by the compositor repainting the
desktop. If the application gets interrupted just after clearing
the window but before rendering the contents, the compositor will
texture from a blank buffer. The result is that the application
window will flicker between a blank window or half-rendered
content. The traditional way to avoid this is to render the new
content into a back buffer and then copy from there into the
compositor surface. The back buffer can be allocated on the fly
and just big enough to hold the new content, or the application
can keep a buffer around. Again, this is under application
control.
</li>
</ol>
<p>
In either case, the application must tell the compositor which area
of the surface holds new contents. When the application renders
directly the to shared buffer, the compositor needs to be noticed
that there is new content. But also when exchanging buffers, the
compositor doesn't assume anything changed, and needs a request from
the application before it will repaint the desktop. The idea that
even if an application passes a new buffer to the compositor, only a
small part of the buffer may be different, like a blinking cursor or
a spinner.
</p>
<h2>X as a Wayland Client</h2>
<p>
Wayland is a complete window system in itself, but even so, if we're
migrating away from X, it makes sense to have a good backwards
compatibility story. With a few changes, the Xorg server can be
modified to use wayland input devices for input and forward either
the root window or individual top-level windows as wayland surfaces.
The server still runs the same 2D driver with the same acceleration
code as it does when it runs natively, the main difference is that
wayland handles presentation of the windows instead of KMS.
</p>
<p><img src="x-on-wayland.png" alt="X on Wayland architecture diagram"></p>
<!-- -->
</body>
</html>
|